"Is ESG Dead? Investigating the Link Between ESG Scores and Stock Market"¶

ECON 323 Project - Dalton Du & Harvey Chan¶

Image generated by AI Midjourney

The primary goal of this analysis is to find how ESG affect the stock performance. To be precise, the questions we are trying to answer are:¶

Does ESG explain Stock Volatilities (Beta), and Is it useful in stock prediction? Do investors care about ESG ?¶

Abstract¶

Background : ESG investing is becoming one of the shiniest jewelry in the global equity market driven by rising demand for investments that foster sustainability. But what is ESG? Do investors really care about ESG? This paper wants to examine whether the ESG score can explain stock volatility and whether the ESG score is useful in stock prediction. We use visualization on correlation and time series to examine stock volatility and introduce a factor methodology to see if ESG help predicts the performance of S&P500 stocks. Methodology : We constructed ESG+ factor portfolio, which consists of ESG factors along with other common-used factors (Profitability, Leverage Ratio, etc.) which will be reduced too three factors by PCA analysis. Then, we applied linear regression, support vector regression, random forest, and gradient boosting on those factors to predict the future return of stocks in S&P 500. The result shows ESG factors, according to p-value in the linear regression model, are not significant in predicting except for S-score which is negative correlated with next quarter stock returns. In terms of the model performace, based on ESG+ factors, the linear regression perform the best in terms of mean square error while random forest performs best on stock portfolio construction. We conclude that it is important to find the balance point between over investing and lack of investing in ESG, and we give out advice to policy makers, corporations and investors in our discussion and conclusion session.

I - Introduction¶

1. What is ESG¶

ESG scores are a set of metrics used to evaluate and quantify a company's performance in three key areas: Environmental, Social, and Governance. These scores are designed to help investors and stakeholders assess the sustainability and ethical impact of a company, as well as its long-term resilience and performance. ESG scores are increasingly being incorporated into asset management firms’ decision-making processes (such as Blackrock, JP. Morgan, Goldman Sachs, etc.) as awareness grows about the importance of responsible and sustainable business practices.

The three components of ESG scores are:

  1. Environmental: This aspect evaluates a company's impact on the environment, including its energy use, waste management, greenhouse gas emissions, natural resource conservation, and overall ecological footprint.
  2. Social: This category assesses a company's relationship with its employees, customers, suppliers, and the broader community. Factors considered include labor practices, diversity and inclusion, human rights, consumer protection, and community engagement.
  3. Governance: This component focuses on a company's internal structure, management, and decision-making processes. It examines aspects such as board diversity and independence, executive compensation, shareholder rights, and transparency in reporting and disclosure.

ESG scores are calculated using a combination of publicly available information and proprietary research. Various organizations and rating agencies, such as Bloomberg, provide ESG scores, and their methodologies and weightings can differ. Bloomberg is a well-regarded provider of financial data, analytics, and research, and its ESG scores are considered reliable and credible. Bloomberg offers ESG data and analytics since 2016, which is now widely used by financial professionals.

2. ESG Investing¶

ESG investing is on fire (Eccles and Stroehle, 2018). Over 3,000 institutional investors and service providers have formally endorsed the Principles for Responsible Investment (PRI), thereby pledging to integrate Environmental, Social, and Governance (ESG) or Corporate Social Responsibility (CSR) considerations into their investment analysis and decision-making procedures (Gullian et al., 2021). However, whether ESG has been priced in stocks are still under huge dispute.

The ESG investment performance literature highlights three primary perspectives:

  1. Neutral Relationship:
  • This view suggests that markets are informationally efficient, making it impossible to achieve superior risk-adjusted returns (Fama, 1970). Hartzmark and Sussman (2019) and Managi et al. (2012) provide evidence supporting this neutral stance.
  • Efficient Market Hypothesis (EMH) posits that all available information, including ESG factors, is already reflected in asset prices, making it impossible to achieve superior risk-adjusted returns consistently.
  • Advocates of this perspective argue that if ESG information is publicly available, it is already priced into the market, leaving no room for outperformance.
  1. Underperformance:
  • Contrary to the neutral viewpoint, research by Adler and Kritzman (2008), Bauer et al. (2005), and Berlinger and Lovas (2015) indicates that ESG investments may potentially underperform. One reason is the lower diversification capability of ESG investments since they represent a subset of the market. Another explanation is that focusing on sustainability aspects might hinder short-term growth.
  • ESG investments are considered a subset of the market, which can limit diversification capabilities, leading to potential underperformance compared to broader market benchmarks.
  • The focus on sustainability aspects may constrain short-term growth as companies may need to invest in environmentally friendly practices, social initiatives, or governance improvements, which could lead to higher costs or lower immediate returns.
  • Investors' preference for ESG investments may drive up valuations, leading to lower future returns for these assets.
  1. Superior Returns:
  • In contrast to the previous perspectives, Consolandi et al. (2009) and Renneboog et al. (2008) advocate for superior returns, emphasizing the principle of "doing well while doing good." This argument asserts that incorporating ESG factors can lead to higher risk-adjusted returns by identifying companies better equipped to address long-term risks and seize opportunities related to sustainability.
  • Proponents argue that incorporating ESG factors into investment decisions helps identify companies better positioned to manage long-term risks, capitalize on opportunities associated with sustainability, and enhance stakeholder relationships.
  • Companies with strong ESG performance may demonstrate better operational efficiency, improved risk management, and enhanced brand reputation, all of which can contribute to higher long-term value creation.
  • ESG integration can lead to better-informed investment decisions by considering non-financial factors that may have material impacts on companies' financial performance over time.
In [1]:
!pip install plotly_express
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import plotly_express as px
import plotly as py
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import seaborn as sns
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: plotly_express in d:\ana\lib\site-packages (0.4.1)
Requirement already satisfied: numpy>=1.11 in c:\users\1020h\appdata\roaming\python\python39\site-packages (from plotly_express) (1.24.2)
Requirement already satisfied: pandas>=0.20.0 in d:\ana\lib\site-packages (from plotly_express) (1.4.4)
Requirement already satisfied: plotly>=4.1.0 in d:\ana\lib\site-packages (from plotly_express) (5.9.0)
Requirement already satisfied: patsy>=0.5 in d:\ana\lib\site-packages (from plotly_express) (0.5.2)
Requirement already satisfied: statsmodels>=0.9.0 in d:\ana\lib\site-packages (from plotly_express) (0.13.2)
Requirement already satisfied: scipy>=0.18 in d:\ana\lib\site-packages (from plotly_express) (1.9.1)
Requirement already satisfied: pytz>=2020.1 in d:\ana\lib\site-packages (from pandas>=0.20.0->plotly_express) (2022.1)
Requirement already satisfied: python-dateutil>=2.8.1 in d:\ana\lib\site-packages (from pandas>=0.20.0->plotly_express) (2.8.2)
Requirement already satisfied: six in d:\ana\lib\site-packages (from patsy>=0.5->plotly_express) (1.16.0)
Requirement already satisfied: tenacity>=6.2.0 in d:\ana\lib\site-packages (from plotly>=4.1.0->plotly_express) (8.0.1)
Requirement already satisfied: packaging>=21.3 in d:\ana\lib\site-packages (from statsmodels>=0.9.0->plotly_express) (21.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in d:\ana\lib\site-packages (from packaging>=21.3->statsmodels>=0.9.0->plotly_express) (3.0.9)

II - Datasets for Visualization¶

In this section, we use two main datasets to investigate those questions:

1) ESG_Disclosure_across_world - from Sustainable Stock Exchange Initiative (SSE)

  • The Sustainable Stock Exchange Initiative (SSE) monitors and tracks the progress of stock exchanges worldwide in establishing ESG guidance for listed companies. The SSE maintains a database of all guidance documents provided by stock exchanges for listed companies, which allows other stock exchanges to learn from their peers. This information can be found on the SSE website, along with more information about the SSE campaign for ESG Disclosure.

2) Industry - from Refinitiv

  • Refinitiv offers an extensive ESG database covering over 70% of the global market cap and 500 different ESG metrics. The data is transparently and objectively measured based on company-reported data, covering 10 main themes such as emissions, environmental product innovation, human rights, and shareholders. Refinitiv provides an overall ESG combined score that discounts for significant ESG controversies impacting the corporations covered. The percentile rank scores are simple to understand (available in both percentages and letter grades from D- to A+). They are benchmarked against The Refinitiv Business Classifications (TRBC – Industry Group) for all environmental and social categories, as well as the controversies score.
In [3]:
ESG_Disclosure_across_world = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Disclosure%20Guidance%20Database%20(1)%20(1).csv", skiprows = [291, 290])
Industry = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/GridExport_April_8_2023_21_38_44%20(1).csv", skiprows = [1])

Sustainable Stock Exchange Initiative (SSE)¶

We use ESG_Disclosure_across_world to track the progress of stock exchanges worldwide in establishing ESG guidance. Columns incldue:

  • Market: The country where the ESG guidance has been launched.
  • continent: The continent of the country.
  • iso_alpha: Standard coding system for identifying countries and territories based on a three-letter code.
  • STOCK EXCHANGE: The stock exchange the ESG guidance has been launched.
  • Year: Year.
  • ESG GUIDANCE: The name of the launched ESG report.
  • Stock exchage & ESG guidance: The stock exchange the ESG guidance has been launched and the name of the launched ESG report.
In [4]:
# Combine descriptive information from the "STOCK EXCHANGE" and "ESG GUIDANCE" columns
ESG_Disclosure_across_world["Stock exchage & ESG guidance"] = ESG_Disclosure_across_world["STOCK EXCHANGE"] + ": " + ESG_Disclosure_across_world["ESG GUIDANCE"]
ESG_Disclosure_across_world
Out[4]:
Market continent iso_alpha STOCK EXCHANGE Year ESG GUIDANCE Stock exchage & ESG guidance
0 China Asia CHN Shenzhen Stock Exchange 2006 Social Responsibility Instructions to Listed C... Shenzhen Stock Exchange: Social Responsibility...
1 China Asia CHN Shenzhen Stock Exchange 2007 Social Responsibility Instructions to Listed C... Shenzhen Stock Exchange: Social Responsibility...
2 China Asia CHN Shenzhen Stock Exchange 2008 Social Responsibility Instructions to Listed C... Shenzhen Stock Exchange: Social Responsibility...
3 China Asia CHN Shenzhen Stock Exchange 2009 Social Responsibility Instructions to Listed C... Shenzhen Stock Exchange: Social Responsibility...
4 China Asia CHN Shenzhen Stock Exchange 2010 Social Responsibility Instructions to Listed C... Shenzhen Stock Exchange: Social Responsibility...
... ... ... ... ... ... ... ...
284 Netherlands Europe NLD Euronext Amsterdam 2022 ESG Reporting Guide - Target 1.5°C Euronext Amsterdam: ESG Reporting Guide - Targ...
285 Norway Europe NOR Oslo Bors (Euronext) 2022 ESG Reporting Guide - Target 1.5°C Oslo Bors (Euronext): ESG Reporting Guide - Ta...
286 Portugal Europe PRT Euronext Lisbon 2022 ESG Reporting Guide - Target 1.5°C Euronext Lisbon: ESG Reporting Guide - Target ...
287 Russian Federation Asia RUS Moscow Exchange 2022 ESG Best Practice Guide (in Russian) Moscow Exchange: ESG Best Practice Guide (in R...
288 UK Europe GBR Euronext London 2022 ESG Reporting Guide - Target 1.5°C Euronext London: ESG Reporting Guide - Target ...

289 rows × 7 columns

Refinitiv¶

We use Refinitiv to compare the market cap, ESG score, and correlation with Beta in different sectors in S&P 500. We generated a customized dataset in the Refinitiv workspace by selecting relevant columns and indicators. Columns incldue:

  • Company Name: This column provides the name of the company for which the ESG scores are being reported.

  • Sector: This column provides the sector in which the company operates.

  • Ticker Symbol: This column provides the stock ticker symbol of the company.

  • Ticker Symbol (Simplified): This column provides a simplified version of the stock ticker symbol of the company.

  • ESG Score (FY0): This column provides the ESG (Environmental, Social, and Governance) score of the company for the current fiscal year (FY0).

  • YoY Change in ESG Score in FY0 (%): This column provides the year-over-year percentage change in the ESG score of the company for the current fiscal year (FY0).

  • ESG Score (FY-1): This column provides the ESG score of the company for the previous fiscal year (FY-1).

  • ESG Score Latest Update Date (FY0): This column provides the date when the ESG score of the company for the current fiscal year (FY0) was last updated.

  • Market Cap: This column provides the market capitalization of the company.

  • Beta: This column provides the beta coefficient of the company's stock, which measures the volatility of the stock in relation to the overall market.

  • Is S&P500: This column provides a binary indicator of whether the company is a constituent of the S&P 500 index (True if yes, False if no).

(Reference: https://library.tu.ac.th/uload/userfiles/files/Refinitv%20Eikon%20manual.pdf , https://community.developers.refinitiv.com/questions/85483/how-to-find-documentation-on-sdg-data-items-fy0-an.html)

In [5]:
# Rename columns
Industry = Industry.rename(columns={"ESG Score\n(FY0)\n(Σ=Avg)": "ESG Score (FY0)"})
Industry = Industry.rename(columns={"ESG Period Last Update Date\n(FY0)": "ESG Score Latest Update Date (FY0)"})
Industry = Industry.rename(columns={"ESG Score\n(FY-1)\n(Σ=Avg)": "ESG Score (FY-1)"})
Industry = Industry.rename(columns={"Original Announcement Date Time\n(FY0)": "Latest Financial Statement Announcement Time (FY0)"})
Industry = Industry.rename(columns={"Ticker Symbol": "Ticker Symbol (Simplified)"})
Industry = Industry.rename(columns={"Identifier (RIC)": "Ticker Symbol"})
Industry = Industry.rename(columns={"TRBC Economic Sector Name": "Sector"})
Industry = Industry.rename(columns={"Market Cap\n(Σ=Avg)": "Market Cap"})
Industry = Industry.rename(columns={"Beta\n(Σ=Avg)": "Beta"})

# Get rid of the rating alphabet letters and only keep the numeral score
Industry['ESG Score (FY0)'] = Industry['ESG Score (FY0)'].str.extract('(\d+\.\d+)').astype(float)
Industry['ESG Score (FY-1)'] = Industry['ESG Score (FY-1)'].str.extract('(\d+\.\d+)').astype(float)

# Make the data in the 'Market Cap' column become numeric and get rid of the ","
Industry['Market Cap'] = pd.to_numeric(Industry['Market Cap'].str.replace(',', ''))

# Remove the given Yoy change with only two decimal and calculate it to get the accurate value
Industry['Yoy Change in ESG Score in FY0 (%)'] = ((Industry['ESG Score (FY0)']/Industry['ESG Score (FY-1)'])-1)*100
Industry = Industry.drop("YoY % Chg in\nESG Score\n(FY0)\n(Σ=Avg)", axis=1)

# Reorder columns
new_column_order = ["Company Name", "Sector", "Ticker Symbol", "Ticker Symbol (Simplified)", "ESG Score (FY0)", "Yoy Change in ESG Score in FY0 (%)","ESG Score (FY-1)","ESG Score Latest Update Date (FY0)", "Market Cap", "Beta"]
Industry = Industry[new_column_order]

# Drop the rows that contain NA in the ESG Score (FY0) - Latest ESG Score column
Industry = Industry.dropna(subset=['ESG Score (FY0)'])

# Insert the list of S&P 500 company
list_of_SP500 = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/constituents.csv")
list_of_SP500 = list_of_SP500.rename(columns={"Symbol": "Ticker Symbol (Simplified)"})
list_of_SP500

# Merge and only contain the data of S&P 500 company
merged_data_Industry = pd.merge(Industry, list_of_SP500, on='Ticker Symbol (Simplified)', how='left')
merged_data_Industry['Is S&P500'] = merged_data_Industry['Ticker Symbol (Simplified)'].isin(list_of_SP500['Ticker Symbol (Simplified)'])
Industry_SP500 = merged_data_Industry.loc[merged_data_Industry['Is S&P500'] == True]

# Reorder columns
Industry_SP500 = Industry_SP500[["Company Name", "Sector", "Ticker Symbol", "Ticker Symbol (Simplified)", "ESG Score (FY0)", "Yoy Change in ESG Score in FY0 (%)","ESG Score (FY-1)","ESG Score Latest Update Date (FY0)", "Market Cap", "Beta", "Is S&P500"]]
Industry_SP500
Out[5]:
Company Name Sector Ticker Symbol Ticker Symbol (Simplified) ESG Score (FY0) Yoy Change in ESG Score in FY0 (%) ESG Score (FY-1) ESG Score Latest Update Date (FY0) Market Cap Beta Is S&P500
0 Baker Hughes Co Energy BKR.OQ BKR 92.13 8.311780 85.06 29/03/2023 2.884841e+10 1.47 True
1 Chevron Corp Energy CVX.N CVX 86.20 2.999164 83.69 29/03/2023 3.196539e+11 1.15 True
2 Kinder Morgan Inc Energy KMI.N KMI 85.90 -1.614935 87.31 29/03/2023 3.981249e+10 0.92 True
3 Halliburton Co Energy HAL.N HAL 85.78 -4.284758 89.62 29/03/2023 2.953318e+10 2.12 True
4 Schlumberger NV Energy SLB.N SLB 85.47 5.206795 81.24 29/03/2023 7.046975e+10 1.78 True
... ... ... ... ... ... ... ... ... ... ... ...
3099 Camden Property Trust Real Estate CPT.N CPT 62.87 -7.666324 68.09 29/03/2023 1.115660e+10 0.80 True
3104 Equity Residential Real Estate EQR.N EQR 61.69 2.526176 60.17 09/03/2023 2.276917e+10 0.83 True
3113 VICI Properties Inc Real Estate VICI.N VICI 56.91 -3.981778 59.27 29/03/2023 3.245590e+10 0.96 True
3116 Invitation Homes Inc Real Estate INVH.N INVH 54.85 21.081678 45.30 29/03/2023 1.926749e+10 0.90 True
3185 Extra Space Storage Inc Real Estate EXR.N EXR 33.06 -3.727432 34.34 29/03/2023 2.197784e+10 0.56 True

475 rows × 11 columns

III - Research Intention of S&P 500 Stock¶¶

Globally, there has been an upward tendency in the development of ESG recommendations in the stock market. With a number of important exchanges and regulatory organisations actively encouraging ESG integration in investment decisions and disclosures (such as the New York Stock Exchange and NASDAQ), the American market in particular has been a prominent driver of this movement.

Additionally, the US has a rather well-developed legislative structure for ESG reporting, with businesses receiving advice from groups like the Global Reporting Initiative (GRI) and the Sustainability Accounting Standards Board (SASB).

We concentrate on the US stock market for various reasons:

1) Publicly listed firms are required to publish certain ESG-related information in their annual reports by the US Securities and Exchange Commission (SEC), such as climate risks, human capital management, and board diversity. Researchers will now find it simpler to acquire and examine ESG data for US-listed corporations.

2) The US market is the biggest in the world, according to the World Federation of Exchanges (2021), with a total market capitalization of more than $47 trillion as of 2020. It comprises a major share of the world's investment opportunities and has the potential to significantly affect both the environment and the global economy.

3) According to MSCI (2017), the US market is home to a number of sizable multinational enterprises that operate in a variety of industries, giving researchers a diversified group of businesses to examine from an ESG standpoint.

4) The S&P 500 is under intense scrutiny from authorities and investors, therefore businesses listed on the index are more likely to be open and responsible with their ESG reporting.

IV - Data Visualizations¶

1) Display the Progression of Establishing ESG Guidance in Stock Exchange across World¶

In [6]:
fig = px.scatter_geo(ESG_Disclosure_across_world, locations="iso_alpha", 
                     hover_name="Stock exchage & ESG guidance",
                     text = "Market", 
                     scope = "world",
                     animation_frame = "Year", 
                     projection="natural earth",
                     title='Progression of Establishing ESG Guidance across World')
fig.show()

da = ESG_Disclosure_across_world.query("Year == 2022")
fig = px.scatter_geo(da, locations="iso_alpha", 
                     hover_name="Stock exchage & ESG guidance",
                     color = "continent",
                     text = "Market", 
                     scope = "world",
                     animation_frame = "Year", 
                     projection="equirectangular",
                     title='Progression of Establishing ESG Guidance till 2022')
fig.show()

2) Display the Market Cap and ESG Score across Sector¶

The Treemap visualization is interactive. You can click on each sector to see the detail on different companies.¶
In [7]:
fig = px.treemap(Industry_SP500, path=[px.Constant("S&P 500"), 'Sector', 'Company Name'], values='Market Cap',
                  color='Market Cap', hover_data=['Ticker Symbol'],
                  color_continuous_scale='dense')
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25), title = "Market Cap Treemap of S&P 500")
fig.show()
In [8]:
fig = px.treemap(Industry_SP500, path=[px.Constant("S&P 500"), 'Sector', 'Company Name'], values='ESG Score (FY0)',
                  color='ESG Score (FY0)', hover_data=['Ticker Symbol'],
                  color_continuous_scale='RdYlGn')
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25), title = "ESG Score Treemap of S&P 500")
fig.show()
In [9]:
# Compute the average ESG Score by Sector
sector_scores_SP500only = Industry_SP500.groupby('Sector')['ESG Score (FY0)'].mean()

# Compute the average Market Cap by Sector
sector_market_cap_SP500only = Industry_SP500.groupby('Sector')['Market Cap'].mean()

# Combine into one Dataframe
combined_data = pd.concat([sector_scores_SP500only, sector_market_cap_SP500only], axis=1)
combined_data.columns = ['Average ESG Score across sectors (S&P 500 only)', "Total Market Cap  (S&P 500 only)"]
combined_data = combined_data.sort_values("Average ESG Score across sectors (S&P 500 only)", ascending=True)
combined_data

# Convert to a list in order to visualize it
sector= combined_data.index.tolist()
sc = combined_data['Average ESG Score across sectors (S&P 500 only)'].tolist()
cap= combined_data['Total Market Cap  (S&P 500 only)'].tolist()


# Create 2 Plot
fig = make_subplots(rows=1, cols=2, specs=[[{}, {}]], shared_xaxes=True,
                    shared_yaxes=False, vertical_spacing=0.001)

# Create Bar Plot on the left
fig.append_trace(go.Bar(
    x=sc,
    y=sector,
    marker=dict(
        color='rgba(50, 171, 96, 0.6)',
        line=dict(
            color='rgba(50, 171, 96, 1.0)',
            width=1),
    ),
    name='Average ESG Score for Each Sectors',
    orientation='h',
), 1, 1)

# Create Scatter Plot on the right
fig.append_trace(go.Scatter(
    x=cap, y=sector,
    mode='lines+markers',
    line_color='rgb(116, 116, 212)',
    name='Market Cap for Each Sectors',
), 1, 2)

# Adjusting graph layout
fig.update_layout(
    title='Average ESG Score and Total Market Cap for Each Sector within S&P 500',
    yaxis=dict(
        showgrid=True,
        showline=False,
        showticklabels=True,
        domain=[0, 0.85],
    ),
    yaxis2=dict(
        showgrid=True,
        showline=True,
        showticklabels=False,
        linecolor='rgba(102, 102, 102, 0.8)',
        linewidth=2,
        domain=[0, 0.85],
    ),
    xaxis=dict(
        zeroline=False,
        showline=False,
        showticklabels=False,
        showgrid=True,
        domain=[0, 0.42],
    ),
    xaxis2=dict(
        zeroline=False,
        showline=False,
        showticklabels=False,
        showgrid=True,
        domain=[0.47, 1],
        side='bottom',
        dtick=1000,
    ),
    legend=dict(x=0.029, y=1.038, font_size=10),
    margin=dict(l=100, r=20, t=70, b=70),
    paper_bgcolor='rgb(248, 248, 255)',
    plot_bgcolor='rgb(248, 248, 255)',
)

annotations = []

y_s = np.round(sc, decimals=2)
ccap = [float(s) / 1000000000 for s in cap]

y_nw = np.rint(ccap)

# Adding labels
for ydn, yd, xd in zip(y_nw, y_s, sector):
    annotations.append(dict(xref='x2', yref='y2',
                            y=xd, x=ydn,
                            text='{:,}'.format(ydn) + 'B',
                            font=dict(family='Arial', size=12,
                                      color='rgb(116, 116, 212)'),
                            showarrow=True))
    annotations.append(dict(xref='x1', yref='y1',
                            y=xd, x=yd + 10,
                            text=str(yd),
                            font=dict(family='Arial', size=12,
                                      color='rgb(50, 171, 96)'),
                            showarrow=False))
    
fig.update_layout(annotations=annotations)

fig.show()
Discussion¶

Industrials have an average ESG score of 63.9, while Consumer Non-Cyclicals have an average ESG score of 75.2. Consumer Non-Cyclicals has the highest average ESG score, followed by Healthcare and Real Estate. Industrials have the lowest average ESG score, followed by Consumer Cyclicals and Financials.

Each sector in the S&P 500 has a different market value, ranging from 2.67 trillion for basic materials to 157.72 trillion for technology. The market capitalization of the sectors Healthcare, Consumer Non-Cyclicals, and Technology are in that order, with Basic Materials having the lowest market capitalization.

Some sectors often have higher ESG scores than others when comparing the average ESG scores across the various sectors. This demonstrates the need to take ESG aspects into account when investing in various areas of the economy.

In the following part, we will further investigate how the ESG score determines the volatility/Beta.

Q1: Does ESG explain Stock Volatilities?¶

Correlation between ESG Score and Beta in different sectors¶

What is Beta¶

Beta is a measure of the volatility, or systematic risk, of a security or portfolio in comparison to the overall market. It is a statistical indicator that assesses how an asset's price has changed in relation to a benchmark's performance, such as the S&P 500 index. When the asset's beta is 1, it moves in lockstep with the market; when it is larger than 1, it is more volatile than the market. The asset is less volatile than the market if the beta value is smaller than 1.

Investors frequently use beta in the financial market to judge an investment's risk in relation to the market as a whole. larger beta assets often offer a larger potential for returns but also carry a higher level of risk because they are more vulnerable to market volatility. In contrast, smaller beta assets offer lower potential returns but also lower levels of risk and volatility.

Overall, we expect to see a mixed relationship between ESG scores and Beta across different sectors. Some sectors may show a positive correlation, while others may show a negative correlation or no correlation at all.

In [10]:
fig = px.scatter(Industry_SP500, x="ESG Score (FY0)", y="Beta", facet_col_wrap=5, facet_col="Sector", color= "Sector", 
                 width = 1050, trendline= "ols")

fig.update_layout(title='Relationship between Beta and ESG Score across Sectors', showlegend=False)  
fig.show()
In [11]:
# Separate the dataset by sector
Energy = Industry_SP500.loc[Industry_SP500["Sector"] == "Energy"]
Basic_Materials = Industry_SP500.loc[Industry_SP500["Sector"] == "Basic Materials"]
Industrials = Industry_SP500.loc[Industry_SP500["Sector"] == "Industrials"]
Consumer_Cyclicals = Industry_SP500.loc[Industry_SP500["Sector"] == "Consumer Cyclicals"]
Consumer_Non_Cyclicals = Industry_SP500.loc[Industry_SP500["Sector"] == "Consumer Non-Cyclicals"]
Financials = Industry_SP500.loc[Industry_SP500["Sector"] == "Financials"]
Healthcare = Industry_SP500.loc[Industry_SP500["Sector"] == "Healthcare"]
Technology = Industry_SP500.loc[Industry_SP500["Sector"] == "Technology"]
Utilities = Industry_SP500.loc[Industry_SP500["Sector"] == "Utilities"]
Real_Estate = Industry_SP500.loc[Industry_SP500["Sector"] == "Real Estate"]

# Get the Beta and rename the columns for each industry
Corr_Energy = Energy[["Beta", "ESG Score (FY0)"]]
Corr_Energy = Corr_Energy.rename(columns={"ESG Score (FY0)": "ESG Score in Energy"})

Corr_Basic_Materials = Basic_Materials[["ESG Score (FY0)", "Beta"]]
Corr_Basic_Materials = Corr_Basic_Materials.rename(columns={"ESG Score (FY0)": "ESG Score in Basic Materials"})

Corr_Industrials = Industrials[["ESG Score (FY0)", "Beta"]]
Corr_Industrials = Corr_Industrials.rename(columns={"ESG Score (FY0)": "ESG Score in Industrials"})

Corr_Consumer_Cyclicals = Consumer_Cyclicals[["ESG Score (FY0)", "Beta"]]
Corr_Consumer_Cyclicals = Corr_Consumer_Cyclicals.rename(columns={"ESG Score (FY0)": "ESG Score in Consumer Cyclicals"})

Corr_Consumer_Non_Cyclicals = Consumer_Non_Cyclicals[["ESG Score (FY0)", "Beta"]]
Corr_Consumer_Non_Cyclicals = Corr_Consumer_Non_Cyclicals.rename(columns={"ESG Score (FY0)": "ESG Score in Consumer Non Cyclicals"})

Corr_Financials = Financials[["ESG Score (FY0)", "Beta"]]
Corr_Financials = Corr_Financials.rename(columns={"ESG Score (FY0)": "ESG Score in Financials"})

Corr_Healthcare = Healthcare[["ESG Score (FY0)", "Beta"]]
Corr_Healthcare = Corr_Healthcare.rename(columns={"ESG Score (FY0)": "ESG Score in Healthcare"})

Corr_Technology = Technology[["ESG Score (FY0)", "Beta"]]
Corr_Technology = Corr_Technology.rename(columns={"ESG Score (FY0)": "ESG Score in Technology"})

Corr_Utilities = Utilities[["ESG Score (FY0)", "Beta"]]
Corr_Utilities = Corr_Utilities.rename(columns={"ESG Score (FY0)": "ESG Score in Utilities"})

Corr_Real_Estate = Real_Estate[["ESG Score (FY0)", "Beta"]]
Corr_Real_Estate = Corr_Real_Estate.rename(columns={"ESG Score (FY0)": "ESG Score in Real_Estate"})

# Merge all the correlations in different sector
Corr = pd.merge(Corr_Energy, Corr_Basic_Materials, how="outer")
Corr = pd.merge(Corr, Corr_Industrials, how="outer")
Corr = pd.merge(Corr, Corr_Consumer_Cyclicals, how="outer")
Corr = pd.merge(Corr, Corr_Consumer_Non_Cyclicals, how="outer")
Corr = pd.merge(Corr, Corr_Financials, how="outer")
Corr = pd.merge(Corr, Corr_Healthcare, how="outer")
Corr = pd.merge(Corr, Corr_Technology, how="outer")
Corr = pd.merge(Corr, Corr_Utilities, how="outer")
Corr = pd.merge(Corr, Corr_Real_Estate, how="outer")

# Covert the Dataframe into correlation matrix
Corr = Corr.corr()
Corr

# Visualization
plt.figure(figsize=(16, 6))
sns.heatmap(Corr, vmin=-1, vmax=1, annot=True, cmap='BrBG',
            annot_kws={'fontsize':7, 'fontweight':'bold'},
           square=True)

plt.title('Correlation between ESG core and Beta across Sectors')

plt.show()

V - Analysis¶

There is a positive correlation between ESG Score and Beta in the Financials and Real Estate sectors, with correlation coefficients of 0.45 and 0.5 on the correlation matrix, and r squared is 0.21 and 0.20 respectively, which means 21% / 20% of the variations of Beta (Y) is explained by the variations of Financials ESG Score / Real Estate ESG Score (X), respectively.This suggests that companies in these sectors with higher ESG scores tend to have higher betas, which means they are more sensitive to market fluctuations.

  • According to Weber (2014), the financial institutions that prioritize sustainability and responsible business practices may be more attractive to socially responsible investors, leading to increased demand for their stocks and potentially driving up their beta. Also, companies with higher ESG Scores are more likely to be well-governed and have better risk management practices, which could lead to higher expected returns and higher betas.
  • Chen et al. (2020) suggests that effective ESG disclosure can have a positive impact on REIT debt financing and firm value due to the increased corporate transparency, which can lead to higher beta values, indicating a greater sensitivity to market fluctuations.

On the other hand, there is a negative correlation between ESG Score and Beta in the Healthcare sector, with a correlation coefficient of -0.27 and r squared is 0.15, which means 15% of the variations of Beta (Y) is explained by the variations of Healthcare ESG Score (X), respectively. This suggests that companies in this sector with higher ESG scores tend to have lower betas, which means they are less sensitive to market fluctuations. This negative correlation may be due to the fact that companies in the healthcare sector tend to be less cyclical and less affected by economic changes, which could result in lower volatility.

  • Healthcare companies tend to have lower betas overall, as they are seen as less sensitive to market fluctuations. It is because companies in the healthcare sector may be less likely to engage in risky or speculative activities that would lead to higher betas, as their primary focus is on providing essential products and services to customers. Additionally, companies in the healthcare sector may be subject to more stringent ESG regulations and oversight, which could reduce their exposure to certain risks and therefore lead to lower betas. A strong ESG performance is better able to mitigate external risks and uncertainties.
Hypoethsis¶

In the financial and real estate industries, there is a positive association between ESG Score and Beta, however in the healthcare industry, there is a negative correlation.

To further investigate the relationship between market volatility and changes in ESG scores. To further evaluate the relationship, we will choose the top three and bottom three ESG score companies in these three industries. We anticipated that the time-series graph would highlight the association between stock volatility and the ESG Score. Additionally, the stock trend will change during the ESG score change period.

VI - Data Collection for Selected Stocks¶

A ) Positive Relationship Between ESG Score and Beta - Financials Sector¶

1) Identify the top three and bottom three ESG score companies in Financials Sector:¶

In [12]:
Financials = Industry_SP500.loc[Industry_SP500["Sector"] == "Financials"]
Financials = Financials.sort_values("ESG Score (FY0)", ascending=False)
Financials.head(3)
Out[12]:
Company Name Sector Ticker Symbol Ticker Symbol (Simplified) ESG Score (FY0) Yoy Change in ESG Score in FY0 (%) ESG Score (FY-1) ESG Score Latest Update Date (FY0) Market Cap Beta Is S&P500
1265 Citigroup Inc Financials C.N C 87.25 -1.867057 88.91 31/03/2023 8.926487e+10 1.56 True
1266 Goldman Sachs Group Inc Financials GS.N GS 84.44 -2.987132 87.04 31/03/2023 1.076154e+11 1.40 True
1267 JPMorgan Chase & Co Financials JPM.N JPM 83.09 -1.981833 84.77 29/03/2023 3.736733e+11 1.08 True
In [13]:
Financials = Industry_SP500.loc[Industry_SP500["Sector"] == "Financials"]
Financials = Financials.sort_values("ESG Score (FY0)", ascending=True)
Financials.head(3)
Out[13]:
Company Name Sector Ticker Symbol Ticker Symbol (Simplified) ESG Score (FY0) Yoy Change in ESG Score in FY0 (%) ESG Score (FY-1) ESG Score Latest Update Date (FY0) Market Cap Beta Is S&P500
1564 Brown & Brown Inc Financials BRO.N BRO 36.28 14.088050 31.80 29/03/2023 1.681318e+10 0.74 True
1457 Loews Corp Financials L.N L 44.06 4.929745 41.99 29/03/2023 1.301680e+10 0.83 True
1430 First Republic Bank Financials FRC.N FRC 46.86 3.741421 45.17 09/03/2023 2.566443e+09 1.01 True

2) Gather their closed stock price per week from 2011-01-01 to 2023-04-08 by Yahoo Finance:¶

1 - read the csv file

2 - Change the data in "Date" into datetime format

3 - Rename the column

4 - Merge all

5 - Separate into "high score dataframe" and "low score dataframe"

In [14]:
C = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/C%20(1).csv")
C['Date'] = pd.to_datetime(C['Date'], format='%Y-%m-%d')
C = C.set_index(C['Date'])
C = C[['Close']]
C = C.rename(columns={"Close": "Citigroup"})

GS = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/GS%20(1).csv")
GS['Date'] = pd.to_datetime(GS['Date'], format='%Y-%m-%d')
GS = GS.set_index(GS['Date'])
GS = GS[['Close']]
GS = GS.rename(columns={"Close": "Goldman Sachs Group"})

JPM = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/JPM%20(1).csv")
JPM['Date'] = pd.to_datetime(JPM['Date'], format='%Y-%m-%d')
JPM = JPM.set_index(JPM['Date'])
JPM = JPM[['Close']]
JPM = JPM.rename(columns={"Close": "JPMorgan Chase & Co"})

BRO = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/BRO%20(1).csv")
BRO['Date'] = pd.to_datetime(BRO['Date'], format='%Y-%m-%d')
BRO = BRO.set_index(BRO['Date'])
BRO = BRO[['Close']]
BRO = BRO.rename(columns={"Close": "Brown & Brown"})

L = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/L%20(1).csv")
L['Date'] = pd.to_datetime(L['Date'], format='%Y-%m-%d')
L = L.set_index(L['Date'])
L = L[['Close']]
L = L.rename(columns={"Close": "Loews Corp"})

FRC = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/FRC%20(1).csv")
FRC['Date'] = pd.to_datetime(FRC['Date'], format='%Y-%m-%d')
FRC = FRC.set_index(FRC['Date'])
FRC = FRC[['Close']]
FRC = FRC.rename(columns={"Close": "First Republic Bank"})

merged_df = pd.merge(C, GS, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, JPM, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, BRO, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, L, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, FRC, left_index=True, right_index=True)
high_score_df = merged_df[["Citigroup", "Goldman Sachs Group", "JPMorgan Chase & Co"]]
low_score_df = merged_df[["Brown & Brown", "Loews Corp", "First Republic Bank"]]

3) Gather their ESG Score for each year (from D- to A+) by Refinitiv:¶

1 - read the csv file

2 - Set index

3 - Make the "period end date" and "esg score" into list

(The purpose of this to create a for loop, and then generate many vertical time lines describing the change in ESG grading during the visualization process)

In [15]:
C_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20C.N%20(1).csv", skiprows = [0,1])
C_score = C_score.set_index('Unnamed: 0', inplace=False)
C_period_end_date = C_score.iloc[0].tolist()
C_esg_scores = C_score.iloc[7].tolist()

GS_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20GS.N%20(1).csv", skiprows = [0,1])
GS_score = GS_score.set_index('Unnamed: 0', inplace=False)
GS_period_end_date = GS_score.iloc[0].tolist()
GS_esg_scores = GS_score.iloc[7].tolist()

JPM_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20JPM.N%20(1).csv", skiprows = [0,1])
JPM_score = JPM_score.set_index('Unnamed: 0', inplace=False)
JPM_period_end_date = JPM_score.iloc[0].tolist()
JPM_esg_scores = JPM_score.iloc[7].tolist()

BRO_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20BRO.N%20(1).csv", skiprows = [0,1])
BRO_score = BRO_score.set_index('Unnamed: 0', inplace=False)
BRO_period_end_date = BRO_score.iloc[0].tolist()
BRO_esg_scores = BRO_score.iloc[7].tolist()

L_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20L.N%20(1).csv", skiprows = [0,1])
L_score = L_score.set_index('Unnamed: 0', inplace=False)
L_period_end_date = L_score.iloc[0].tolist()
L_esg_scores = L_score.iloc[7].tolist()

FRC_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20FRC.N%20(1).csv", skiprows = [0,1])
FRC_score = FRC_score.set_index('Unnamed: 0', inplace=False)
FRC_period_end_date = FRC_score.iloc[0].tolist()
FRC_esg_scores = FRC_score.iloc[7].tolist()

4) Design a function that visualize the ESG grading (from D- to A+) into different color (from red to green):¶

In [16]:
def color_score(x):
    if x == "D-":
        return "#FF0000"
    elif x == "D":
        return "#FF1100"
    elif x == "D+":
        return "#FF3300"
    elif x == "C-":
        return "#FF6600"
    elif x == "C":
        return "#FF9900"
    elif x == "C+":
        return "#FFCC00"        
    elif x == "B-":
        return "#FFFF00"
    elif x == "B":
        return "#CCFF00"
    elif x == "B+":
        return "#99FF00"
    elif x == "A-":
        return "#66FF00"
    elif x == "A":
        return "#33FF00"
    elif x == "A+":
        return "#00FF00"
    else:
        return "#77FF00"

5) Visualization:¶

1 - Create time-series plots (One plot for Top 3 and one plot for Bottom 3)

2 - Create for loop for the top 3/bottom 3 companies and generate many vertical timelines per year. It is used to describe the change in ESG grading

3 - Insert the color based on their ESG grading

4 - Insert the Beta for each company

5 - Set title, color for each time-series, range, etc...

In [17]:
fig = px.line(high_score_df, facet_col="variable", facet_col_wrap=3, labels=dict(value="Stock Price", variable="Company"), 
              color_discrete_sequence=["#eb663b", "#511cfb", "#00a08b"], range_y=[0,450])
        
for i in range(len(C_period_end_date) - 1):
    esg_score = C_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=C_period_end_date[i], x1=C_period_end_date[i+1], col=1,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)
    
fig.add_vrect(x0=C_period_end_date[0], x1=C_period_end_date[1], col=1,annotation_text="Beta = 1.56", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

for i in range(len(GS_period_end_date) - 1):
    esg_score = GS_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=GS_period_end_date[i], x1=GS_period_end_date[i+1], col=2,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)

fig.add_vrect(x0=GS_period_end_date[0], x1=GS_period_end_date[1], col=2,annotation_text="Beta = 1.40", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

for i in range(len(JPM_period_end_date) - 1):
    esg_score = JPM_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=JPM_period_end_date[i], x1=JPM_period_end_date[i+1], col=3,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)

fig.add_vrect(x0=JPM_period_end_date[0], x1=JPM_period_end_date[1], col=3,annotation_text="Beta = 1.08", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

fig.update_layout(title='Stock Price and ESG Rating of the Three Highest ESG Score Financial Companies in S&P500',
                  height=400,legend=dict(y=0.5, font_size=10)) 

fig.show()  

fig = px.line(low_score_df, facet_col="variable", facet_col_wrap=3, labels=dict(value="Stock Price", variable="Company"),
             color_discrete_sequence=["#b2828d", "#6c7c32", "#778aae"], range_y=[0,450])
for i in range(len(BRO_period_end_date) - 1):
    esg_score = BRO_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=BRO_period_end_date[i], x1=BRO_period_end_date[i+1], col=1,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)

fig.add_vrect(x0=BRO_period_end_date[0], x1=BRO_period_end_date[1], col=1,annotation_text="Beta = 0.74", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

for i in range(len(L_period_end_date) - 1):
    esg_score = L_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=L_period_end_date[i], x1=L_period_end_date[i+1], col=2,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)

fig.add_vrect(x0=L_period_end_date[0], x1=L_period_end_date[1], col=2,annotation_text="Beta = 0.83", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

    
for i in range(len(FRC_period_end_date) - 1):
    esg_score = FRC_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=FRC_period_end_date[i], x1=FRC_period_end_date[i+1], col=3,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)
    
fig.add_vrect(x0=FRC_period_end_date[0], x1=FRC_period_end_date[1], col=3,annotation_text="Beta = 1.01", annotation_position="inside bottom right", opacity=0.15, line_width=0)    
    
fig.update_layout(title='Stock Price and ESG Rating of the Three Lowest ESG Score Financial Companies in S&P500', 
                  height=400, legend=dict(y=0.5, font_size=10))  
fig.show()
Discussion - The result of Financials Sector¶

We observe that The top 3 ESG rating Financial companies generally have a larger Beta, and they have more volatility and growth potential compared to the bottom 3 ESG rating Financial companies. In contrast, the bottom 3 ESG rating Financial companies generally have a less Beta and less growth potential. One thing to notice is that although the First Republic Bank seems to have a great return before 2022, the ESG rating does reflect its weakness in risk management practices: "First Republic Bank’s wealthy depositors have been moving their money to larger institutions seen as less likely to go under. The bank’s depositors have withdrawn roughly $70 billion since SBV’s collapse earlier this month, the Wall Street Journal reported. " This also proves that companies with higher ESG Scores are more likely to be well-governed and have better risk management practices.

B ) Positive Relationship Between ESG Score and Beta - Real Estate Sector¶

1) Identify the top three and bottom three ESG score companies in Real Estate Sector:¶

In [18]:
Real_Estate = Industry_SP500.loc[Industry_SP500["Sector"] == "Real Estate"]
Real_Estate = Real_Estate.sort_values("ESG Score (FY0)", ascending=False)
Real_Estate.head(3)
Out[18]:
Company Name Sector Ticker Symbol Ticker Symbol (Simplified) ESG Score (FY0) Yoy Change in ESG Score in FY0 (%) ESG Score (FY-1) ESG Score Latest Update Date (FY0) Market Cap Beta Is S&P500
3038 Healthpeak Properties Inc Real Estate PEAK.N PEAK 88.59 -1.325462 89.78 31/03/2023 1.172752e+10 0.88 True
3039 CBRE Group Inc Real Estate CBRE.N CBRE 85.53 -0.650482 86.09 29/03/2023 2.185135e+10 1.36 True
3040 Realty Income Corp Real Estate O.N O 85.43 10.118587 77.58 09/03/2023 4.132990e+10 0.80 True
In [19]:
Real_Estate = Industry_SP500.loc[Industry_SP500["Sector"] == "Real Estate"]
Real_Estate = Real_Estate.sort_values("ESG Score (FY0)", ascending=True)
Real_Estate.head(3)
Out[19]:
Company Name Sector Ticker Symbol Ticker Symbol (Simplified) ESG Score (FY0) Yoy Change in ESG Score in FY0 (%) ESG Score (FY-1) ESG Score Latest Update Date (FY0) Market Cap Beta Is S&P500
3185 Extra Space Storage Inc Real Estate EXR.N EXR 33.06 -3.727432 34.34 29/03/2023 2.197784e+10 0.56 True
3116 Invitation Homes Inc Real Estate INVH.N INVH 54.85 21.081678 45.30 29/03/2023 1.926749e+10 0.90 True
3113 VICI Properties Inc Real Estate VICI.N VICI 56.91 -3.981778 59.27 29/03/2023 3.245590e+10 0.96 True

2) Gather their closed stock price per week from 2011-01-01 to 2023-04-08 by Yahoo Finance:¶

1 - read the csv file

2 - Change the data in "Date" into datetime format

3 - Rename the column

4 - Merge all

5 - Separate into "high score dataframe" and "low score dataframe"

In [20]:
PEAK = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/PEAK%20(2).csv")
PEAK['Date'] = pd.to_datetime(PEAK['Date'], format='%Y-%m-%d')
PEAK = PEAK.set_index(PEAK['Date'])
PEAK = PEAK[['Close']]
PEAK = PEAK.rename(columns={"Close": "Healthpeak Properties"})

CBRE = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/CBRE%20(1).csv")
CBRE['Date'] = pd.to_datetime(CBRE['Date'], format='%Y-%m-%d')
CBRE = CBRE.set_index(CBRE['Date'])
CBRE = CBRE[['Close']]
CBRE = CBRE.rename(columns={"Close": "CBRE Group"})

O = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/O%20(1).csv")
O['Date'] = pd.to_datetime(O['Date'], format='%Y-%m-%d')
O = O.set_index(O['Date'])
O = O[['Close']]
O = O.rename(columns={"Close": "Realty Income Corp"})

EXR = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/EXR%20(1).csv")
EXR['Date'] = pd.to_datetime(EXR['Date'], format='%Y-%m-%d')
EXR = EXR.set_index(EXR['Date'])
EXR = EXR[['Close']]
EXR = EXR.rename(columns={"Close": "Extra Space Storage"})

INVH = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/INVH%20(2).csv")
INVH['Date'] = pd.to_datetime(INVH['Date'], format='%Y-%m-%d')
INVH = INVH.set_index(INVH['Date'])
INVH = INVH[['Close']]
INVH = INVH.rename(columns={"Close": "Invitation Homes"})

VICI = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/VICI%20(2).csv")
VICI['Date'] = pd.to_datetime(VICI['Date'], format='%Y-%m-%d')
VICI = VICI.set_index(VICI['Date'])
VICI = VICI[['Close']]
VICI = VICI.rename(columns={"Close": "VICI Properties"})

high_score_df = pd.merge(PEAK, CBRE, how="outer", left_on="Date", right_on = "Date")
high_score_df = pd.merge(high_score_df, O, how="outer", left_on="Date", right_on = "Date")

low_score_df = pd.merge(EXR, INVH, how="outer", left_on="Date", right_on = "Date")
low_score_df = pd.merge(low_score_df, VICI, how="outer", left_on="Date", right_on = "Date")

3) Gather their ESG Score for each year (from D- to A+) by Refinitiv:¶

1 - read the csv file

2 - Set index

3 - Make the "period end date" and "esg score" into list

(The purpose of this to create a for loop, and then generate many vertical time lines describing the change in ESG grading during the visualization process)

In [21]:
PEAK_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20PEAK.N%20(1).csv", skiprows = [0,1])
PEAK_score = PEAK_score.set_index('Unnamed: 0', inplace=False)
PEAK_period_end_date = PEAK_score.iloc[0].tolist()
PEAK_esg_scores = PEAK_score.iloc[7].tolist()

CBRE_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20CBRE.N%20(1).csv", skiprows = [0,1])
CBRE_score = CBRE_score.set_index('Unnamed: 0', inplace=False)
CBRE_period_end_date = CBRE_score.iloc[0].tolist()
CBRE_esg_scores = CBRE_score.iloc[7].tolist()

O_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20O.N%20(1).csv", skiprows = [0,1])
O_score = O_score.set_index('Unnamed: 0', inplace=False)
O_period_end_date = O_score.iloc[0].tolist()
O_esg_scores = O_score.iloc[7].tolist()

EXR_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20EXR.N%20(1).csv", skiprows = [0,1])
EXR_score = EXR_score.set_index('Unnamed: 0', inplace=False)
EXR_period_end_date = EXR_score.iloc[0].tolist()
EXR_esg_scores = EXR_score.iloc[7].tolist()

INVH_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20INVH.N%20(1).csv", skiprows = [0,1])
INVH_score = INVH_score.set_index('Unnamed: 0', inplace=False)
INVH_period_end_date = INVH_score.iloc[0].tolist()
INVH_esg_scores = INVH_score.iloc[7].tolist()

VICI_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20VICI.N%20(1).csv", skiprows = [0,1])
VICI_score = VICI_score.set_index('Unnamed: 0', inplace=False)
VICI_period_end_date = VICI_score.iloc[0].tolist()
VICI_esg_scores = VICI_score.iloc[7].tolist()

4) Visualization:¶

1 - Create time-series plots (One plot for Top 3 and one plot for Bottom 3)

2 - Create for loop for the top 3/bottom 3 companies and generate many vertical timelines per year. It is used to describe the change in ESG grading

3 - Insert the color based on their ESG grading

4 - Insert the Beta for each company

5 - Set title, color for each time-series, range, etc...

In [22]:
fig = px.line(high_score_df, facet_col="variable", facet_col_wrap=3, labels=dict(value="Stock Price", variable="Company"), 
              color_discrete_sequence=["#2e91e5", "#e15f99", "#1ca71c"], range_y=[0,250])
        
for i in range(len(PEAK_period_end_date) - 1):
    esg_score = PEAK_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=PEAK_period_end_date[i], x1=PEAK_period_end_date[i+1], col=1,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)
    
fig.add_vrect(x0=PEAK_period_end_date[0], x1=PEAK_period_end_date[1], col=1,annotation_text="Beta = 0.88", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

for i in range(len(CBRE_period_end_date) - 1):
    esg_score = CBRE_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=CBRE_period_end_date[i], x1=CBRE_period_end_date[i+1], col=2,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)

fig.add_vrect(x0=CBRE_period_end_date[0], x1=CBRE_period_end_date[1], col=2,annotation_text="Beta = 1.36", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

for i in range(len(O_period_end_date) - 1):
    esg_score = O_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=O_period_end_date[i], x1=O_period_end_date[i+1], col=3,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)

fig.add_vrect(x0=O_period_end_date[0], x1=O_period_end_date[1], col=3,annotation_text="Beta = 0.80", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

fig.update_layout(title='Stock Price and ESG Rating of the Three Highest ESG Score Real Estate Companies in S&P500',
                  height=400,legend=dict(y=0.5, font_size=10)) 

fig.show()  

fig = px.line(low_score_df, facet_col="variable", facet_col_wrap=3, labels=dict(value="Stock Price", variable="Company"),
             color_discrete_sequence=["#fb0d0d", "#da16ff", "#750d86"], range_y=[0,250])
for i in range(len(EXR_period_end_date) - 1):
    esg_score = EXR_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=EXR_period_end_date[i], x1=EXR_period_end_date[i+1], col=1,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)

fig.add_vrect(x0=EXR_period_end_date[0], x1=EXR_period_end_date[1], col=1,annotation_text="Beta = 0.56", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

for i in range(len(INVH_period_end_date) - 1):
    esg_score = INVH_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=INVH_period_end_date[i], x1=INVH_period_end_date[i+1], col=2,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)

fig.add_vrect(x0=INVH_period_end_date[0], x1=INVH_period_end_date[1], col=2,annotation_text="Beta = 0.90", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

    
for i in range(len(VICI_period_end_date) - 1):
    esg_score = VICI_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=VICI_period_end_date[i], x1=VICI_period_end_date[i+1], col=3,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)
    
fig.add_vrect(x0=VICI_period_end_date[0], x1=VICI_period_end_date[1], col=3,annotation_text="Beta = 0.96", annotation_position="inside bottom right", opacity=0.15, line_width=0)    
    
fig.update_layout(title='Stock Price and ESG Rating of the Three Lowest ESG Score Real Estate Companies in S&P500', 
                 height=400, legend=dict(y=0.5, font_size=10))  
fig.show()
Discussion - The result of Real Estate Sector¶

We observe that there are no differences in the top 3 and bottom 3 ESG rating Real Estate companies in terms of stock volatility and beta. It is different from what we expected to see and the reference paper.

C ) Negative Relationship Between ESG Score and Beta - Healthcare Sector¶

1) Identify the top three and bottom three ESG score companies in Real Estate Sector:¶

In [23]:
Healthcare_Sector = Industry_SP500.loc[Industry_SP500["Sector"] == "Healthcare"]
Healthcare_Sector_high = Healthcare_Sector.sort_values("ESG Score (FY0)", ascending=False)
Healthcare_Sector_high.head(3)
Out[23]:
Company Name Sector Ticker Symbol Ticker Symbol (Simplified) ESG Score (FY0) Yoy Change in ESG Score in FY0 (%) ESG Score (FY-1) ESG Score Latest Update Date (FY0) Market Cap Beta Is S&P500
1784 Johnson & Johnson Healthcare JNJ.N JNJ 92.20 6.824238 86.31 31/03/2023 4.291241e+11 0.53 True
1785 CVS Health Corp Healthcare CVS.N CVS 88.89 1.414718 87.65 29/03/2023 9.957002e+10 0.60 True
1786 Agilent Technologies Inc Healthcare A.N A 88.74 1.533181 87.40 09/03/2023 4.081573e+10 1.03 True
In [24]:
Healthcare_Sector = Industry_SP500.loc[Industry_SP500["Sector"] == "Healthcare"]
Healthcare_Sector_high = Healthcare_Sector.sort_values("ESG Score (FY0)", ascending=True)
Healthcare_Sector_high.head(3)
Out[24]:
Company Name Sector Ticker Symbol Ticker Symbol (Simplified) ESG Score (FY0) Yoy Change in ESG Score in FY0 (%) ESG Score (FY-1) ESG Score Latest Update Date (FY0) Market Cap Beta Is S&P500
1935 Universal Health Services Inc Healthcare UHS.N UHS 45.53 30.533257 34.88 31/03/2023 8.972948e+09 1.25 True
1916 Dexcom Inc Healthcare DXCM.OQ DXCM 47.81 8.634401 44.01 29/03/2023 4.359738e+10 1.16 True
1899 Bio Rad Laboratories Inc Healthcare BIO.N BIO 51.78 6.171827 48.77 29/03/2023 1.378785e+10 0.91 True

2) Gather their closed stock price per week from 2011-01-01 to 2023-04-08 by Yahoo Finance:¶

1 - read the csv file

2 - Change the data in "Date" into datetime format

3 - Rename the column

4 - Merge all

5 - Separate into "high score dataframe" and "low score dataframe"

In [25]:
JNJ = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/JNJ%20(1).csv")
JNJ['Date'] = pd.to_datetime(JNJ['Date'], format='%Y-%m-%d')
JNJ = JNJ.set_index(JNJ['Date'])
JNJ = JNJ[['Close']]
JNJ = JNJ.rename(columns={"Close": "Johnson & Johnson"})

CVS = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/CVS%20(1).csv")
CVS['Date'] = pd.to_datetime(CVS['Date'], format='%Y-%m-%d')
CVS = CVS.set_index(CVS['Date'])
CVS = CVS[['Close']]
CVS = CVS.rename(columns={"Close": "CVS Health Corp"})

A = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/A%20(1).csv")
A['Date'] = pd.to_datetime(A['Date'], format='%Y-%m-%d')
A = A.set_index(A['Date'])
A = A[['Close']]
A = A.rename(columns={"Close": "Agilent Technologies"})

UHS = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/UHS%20(1).csv")
UHS['Date'] = pd.to_datetime(UHS['Date'], format='%Y-%m-%d')
UHS = UHS.set_index(UHS['Date'])
UHS = UHS[['Close']]
UHS = UHS.rename(columns={"Close": "Universal Health Services Inc"})

DXCM = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/DXCM%20(1).csv")
DXCM['Date'] = pd.to_datetime(DXCM['Date'], format='%Y-%m-%d')
DXCM = DXCM.set_index(DXCM['Date'])
DXCM = DXCM[['Close']]
DXCM = DXCM.rename(columns={"Close": "Dexcom Inc"})

BIO = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/BIO%20(1).csv")
BIO['Date'] = pd.to_datetime(BIO['Date'], format='%Y-%m-%d')
BIO = BIO.set_index(BIO['Date'])
BIO = BIO[['Close']]
BIO = BIO.rename(columns={"Close": "Bio Rad Laboratories Inc"})

merged_df = pd.merge(JNJ, CVS, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, A, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, UHS, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, DXCM, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, BIO, left_index=True, right_index=True)

high_score_df = merged_df[["Johnson & Johnson", "CVS Health Corp", "Agilent Technologies"]]
low_score_df = merged_df[["Universal Health Services Inc", "Dexcom Inc", "Bio Rad Laboratories Inc"]]

3) Gather their ESG Score for each year (from D- to A+) by Refinitiv:¶

1 - read the csv file

2 - Set index

3 - Make the "period end date" and "esg score" into list

(The purpose of this to create a for loop, and then generate many vertical time lines describing the change in ESG grading during the visualization process)

In [26]:
JNJ_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20JNJ.N%20(1).csv", skiprows = [0,1])
JNJ_score = JNJ_score.set_index('Unnamed: 0', inplace=False)
JNJ_period_end_date = JNJ_score.iloc[0].tolist()
JNJ_esg_scores = JNJ_score.iloc[7].tolist()

CVS_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20CVS.N%20(1).csv", skiprows = [0,1])
CVS_score = CVS_score.set_index('Unnamed: 0', inplace=False)
CVS_period_end_date = CVS_score.iloc[0].tolist()
CVS_esg_scores = CVS_score.iloc[7].tolist()

A_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20A.N%20(1).csv", skiprows = [0,1])
A_score = A_score.set_index('Unnamed: 0', inplace=False)
A_period_end_date = A_score.iloc[0].tolist()
A_esg_scores = A_score.iloc[7].tolist()

UHS_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20UHS.N%20(1).csv", skiprows = [0,1])
UHS_score = UHS_score.set_index('Unnamed: 0', inplace=False)
UHS_period_end_date = UHS_score.iloc[0].tolist()
UHS_esg_scores = UHS_score.iloc[7].tolist()

DXCM_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20DXCM.OQ%20(1).csv", skiprows = [0,1])
DXCM_score = DXCM_score.set_index('Unnamed: 0', inplace=False)
DXCM_period_end_date = DXCM_score.iloc[0].tolist()
DXCM_esg_scores = DXCM_score.iloc[7].tolist()

BIO_score = pd.read_csv("https://raw.githubusercontent.com/KOM-1020/Harvey-ECON-323/main/ESG%20Table%20for%20BIO.N%20(1).csv", skiprows = [0,1])
BIO_score = BIO_score.set_index('Unnamed: 0', inplace=False)
BIO_period_end_date = BIO_score.iloc[0].tolist()
BIO_esg_scores = BIO_score.iloc[7].tolist()

4) Visualization:¶

1 - Create time-series plots (One plot for Top 3 and one plot for Bottom 3)

2 - Create for loop for the top 3/bottom 3 companies and generate many vertical timelines per year. It is used to describe the change in ESG grading

3 - Insert the color based on their ESG grading

4 - Insert the Beta for each company

5 - Set title, color for each time-series, range, etc...

In [27]:
fig = px.line(high_score_df, facet_col="variable", facet_col_wrap=3, labels=dict(value="Stock Price", variable="Company"), 
              color_discrete_sequence=["#862a16", "#a777f1", "#620042"], range_y=[0,900])
        
for i in range(len(JNJ_period_end_date) - 1):
    esg_score = JNJ_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=JNJ_period_end_date[i], x1=JNJ_period_end_date[i+1], col=1,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)
    
fig.add_vrect(x0=JNJ_period_end_date[0], x1=JNJ_period_end_date[1], col=1,annotation_text="Beta = 0.53", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

for i in range(len(CVS_period_end_date) - 1):
    esg_score = CVS_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=CVS_period_end_date[i], x1=CVS_period_end_date[i+1], col=2,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)

fig.add_vrect(x0=CVS_period_end_date[0], x1=CVS_period_end_date[1], col=2,annotation_text="Beta = 0.60", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

for i in range(len(A_period_end_date) - 1):
    esg_score = A_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=A_period_end_date[i], x1=A_period_end_date[i+1], col=3,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)
    
fig.add_vrect(x0=A_period_end_date[0], x1=A_period_end_date[1], col=3,annotation_text="Beta = 1.03", annotation_position="inside bottom right", opacity=0.15, line_width=0)    
 
fig.update_layout(title='Stock Price and ESG Rating of the Three Highest ESG Score Healthcare Companies in S&P500',
                  height=400,legend=dict(y=0.5, font_size=10)) 

fig.show()  

fig = px.line(low_score_df, facet_col="variable", facet_col_wrap=3, labels=dict(value="Stock Price", variable="Company"),
             color_discrete_sequence=["#1616a7", "#0d2a63", "#af0038"], range_y=[0,900])
for i in range(len(UHS_period_end_date) - 1):
    esg_score = UHS_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=UHS_period_end_date[i], x1=UHS_period_end_date[i+1], col=1,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)

fig.add_vrect(x0=UHS_period_end_date[0], x1=UHS_period_end_date[1], col=1,annotation_text="Beta = 1.25", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

for i in range(len(DXCM_period_end_date) - 1):
    esg_score = DXCM_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=DXCM_period_end_date[i], x1=DXCM_period_end_date[i+1], col=2,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)

fig.add_vrect(x0=DXCM_period_end_date[0], x1=DXCM_period_end_date[1], col=2,annotation_text="Beta = 1.16", annotation_position="inside bottom right", opacity=0.15, line_width=0)    

    
for i in range(len(BIO_period_end_date) - 1):
    esg_score = BIO_esg_scores[i]
    fillcolor = color_score(esg_score)
    
    fig.add_vrect(x0=BIO_period_end_date[i], x1=BIO_period_end_date[i+1], col=3,
                  annotation_text=esg_score, annotation_position="top left",
                  fillcolor=fillcolor, opacity=0.15, line_width=1)
    
fig.add_vrect(x0=BIO_period_end_date[0], x1=BIO_period_end_date[1], col=3,annotation_text="Beta = 0.91", annotation_position="inside bottom right", opacity=0.15, line_width=0)    
    
fig.update_layout(title='Stock Price and ESG Rating of the Three Lowest ESG Score Healthcare Companies in S&P500', 
                  height=400, legend=dict(y=0.5, font_size=10))  
fig.show()
Discussion - The result of Healthcare Sector¶

It appears that the top 3 ESG rating Healthcare companies have a smaller beta and less volatility and are more stable compared to the bottom 3 ESG rating Healthcare companies. Healthcare Sector is different from Financial Sector and Real Estate Sector. The negative correlation is mainly due to it being a "defensive sector." Investors hope to invest in a stable return company during bad times.

For the bottom 3 ESG rating Healthcare companies, Universal Health Service and Dexcom contain a higher fluctuation and uncertainty than the top 3 ESG rating Healthcare companies; however, they don't reward a exceed/extra return. Bio-Rad Laboratories is the exception, as it is not characterized as a "defensive stock." It has experienced a large increase and slump during the recent 3 years.

Problem with ESG Score¶

In the next section, we want to extract the influences of ESG scores on sectors to run Multi-Factor Machine Learning Modelings. To answer questions such as: Are ESG factors significant in stock prediction? We realize that we need to change into Bloomberg ESG score since Bloomberg has a different and specific grading ESG framework for each industry, while Refinitiv is not comprehensive enough in terms of framework diversity. We suggest that Bloomberg is more accurate and reliable. So we will change to use the Bloomberg ESG score in the following Machine Learning analysis.

ESG investing and Multi-Factor Machine Learning Modeling¶

VII - Literature Review¶

1. Factor Construction¶

Financial market is rife with uncertainties, making it extremely hard to capture extra stock return, or otherwise everyone in the world would be extremely rich. One of famous approach to predict stock returns followed by scholars is to apply factor constructions (proven extremely effective in interpreting factors that are driving the market and widely used by big asset management firms like D.E. Shaw). Multiple factor models attempt to describe asset returns and their covariance matrix as a function of a limited number of factor attributes.

Many scholars now identify ESG as new risk factors similarly to the original FF factors (Hübel and Scholz, 2020; Jin, 2018; Maiti, 2020). We apply factors construction on ESG scores to test whether they are valid new factors that helpful to stock prediction (FF, 1996, 2015, 2017). Also, we incorporated most commonly-used factors (net income, PE, PB, Momentum, ROA, ROE, Revenue, Debt Ratio) that relate to fundamental analysis. Portfolio managers who integrate sustainability in their investment portfolios undertake a dual optimisation process that combines ESG strategies with fundamental valuation, and this is why we use ESG factors plus fundamental factors as our factor portfolio.

Traiditionally, scholars will use linear regression to do a multi-factor modelling, like follows. This is based on one of the most famous model in asset pricing: CAPM. $$ R = α + \sum_{j=1}^m β_{1_j}(ESG Factor_j) + \sum_{i=1}^n β_{2_i}(PCA Factor_i) + β₃(Covid Factor) + ε $$

Where, R represents the return, α is the intercept, $β_{1_j}, β_{2_i}, and β₃$ are the factor loadings, and $ε$ is the error term. The factors considered are a series of m ESG (Environmental, Social, and Governance) factors denoted by ESG Factor_j, a series of $n$ PCA (Principal Component Analysis) factors denoted by PCA Factor_i, and Covid factor.

However, now, with the development of quantitative finance, it is increasinly popular to combine factors with other complex machine learning models in to help capture non-linear trends.

2. Machine Learning¶

A) Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a statistical technique widely employed by financial scholars (Yu et al., 2018) to reduce the dimensionality of variables in a given dataset. This method typically involves applying covariance analysis among various attributes. The original data is transformed into a new coordinate system based on the variance within the data. PCA uses a mathematical procedure to convert correlated variables into linearly uncorrelated variables called principal components.

To reduce the dimensionality of the transformed data, only the first few components are considered, as the first principal component accounts for the largest variance in the data, and each subsequent component accounts for as much of the remaining variability as possible. PCA is particularly useful when there are numerous correlated dimensions, which often contain significant data redundancy. PCA can be utilized to decrease this redundancy, resulting in the reduction of highly correlated data into a smaller number of uncorrelated principal components that account for most of the variance in the original data.

In our model, several fundamental factors, such as ROE and PE, are correlated. PCA can help us reduce dimensionality, thereby decreasing complexity and redundancy.

In Principal Component Analysis (PCA), the covariance matrix is computed, denoted as $\mathbf{S}$:

$$ \mathbf{S} = \frac{1}{N-1} \mathbf{X}^T \mathbf{X} $$

Where $\mathbf{X}$ is the original data matrix, and $N$ is the number of observations.

Next, the eigenvectors and eigenvalues of the covariance matrix $\mathbf{S}$ will be ocmputed. Let $\mathbf{W}$ be the matrix of eigenvectors, and $\boldsymbol{\Lambda}$ be the diagonal matrix of eigenvalues:

$$ \mathbf{SW} = \mathbf{W \boldsymbol{\Lambda}} $$

The eigenvectors are sorted in descending order based on their corresponding eigenvalues. The eigenvector associated with the highest eigenvalue corresponds to the first principal component, which accounts for the largest variance in the data. Subsequent eigenvectors are chosen to be orthogonal to the previous ones and account for the maximum remaining variance.

The matrix $\mathbf{Z}$ of principal components is obtained by projecting the original data matrix $\mathbf{X}$ onto the eigenvectors:

$$ \mathbf{Z} = \mathbf{XW} $$

Financial experts often apply machine learning techniques to make accurate predictions and optimize investment strategies. It is worth noticing that there has been studys shown ESG scores and fundamental variables are non-linear with stock performance (Pu, 2022; Ahmah et al., 2021) and also there are studys shown they could be linear (Han, 2016). This is why in the following models, we will be using both linear and non-linear models to how factors perform on predicting stock returns. In this paper, linear regression (linear), and support vector regression with RBF kernel, random forest, and gradient boosting (non-linear) will be utilized.

B) Linear Regression

Linear Regression is a simple machine learning technique that aims to model the relationship between a dependent variable and one or more independent variables ("Hello World" function in quantitative finance). Easiest model for multi-factor modelling. The function minimized in Linear Regression is the Mean Squared Error (MSE).

$$ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$

Where:

  • $n$ is the number of observations
  • $y_i$ represents the actual values of the dependent variable
  • $\hat{y}_i$ denotes the predicted values of the dependent variable

Minimizing the MSE ensures that the model's predictions are as close as possible to the actual data points.

C) Support Vector Regression (SVR)

Support Vector Regression is an advanced machine learning technique derived from Support Vector Machines (SVMs). It aims to find the best-fitting line (or hyperplane) while allowing a certain degree of error for data points. The main idea behind SVR is to minimize the model's complexity while maintaining a certain level of accuracy. In this paper,we will use RBF kernel to capture non-linear attribute of the data

For the RBF kernel, the Support Vector Regression (SVR) model still aims to minimize the same objective function as mentioned before. However, the main difference lies in the transformation of the input data using the RBF kernel. Here is the objective function for SVR with the RBF kernel:

Objective Function (SVR with RBF Kernel):

$$ \min_{w,b} \frac{1}{2} ||w||^2 + C \sum_{i=1}^{n} (L_\epsilon(y_i, \hat{y}_i)) $$

Where:

  • $w$ represents the weight vector
  • $b$ denotes the bias term
  • $C$ is a regularization parameter
  • $L_\epsilon(y, \hat{y})$ is the Epsilon-Insensitive Loss function

Minimizing this objective function ensures that the model strikes a balance between accuracy and complexity, avoiding overfitting (Rohmah, 2021).

D) Random Forest

Random Forest is an ensemble learning technique that builds multiple Decision Trees and combines their predictions through a majority vote or averaging. This model aims to minimize the variance and reduce overfitting by averaging the predictions of individual trees. It uses bagging (Bootstrap Aggregating) to create different training subsets for each tree, ensuring diversity.

The primary function that Random Forest tries to minimize is the Gini impurity or the entropy at each split, depending on the chosen criterion.

Gini Impurity:

$$ Gini(p) = 1 - \sum_{i=1}^c p_i^2 $$

Entropy:

$$ Entropy(p) = -\sum_{i=1}^c p_i \log_2 p_i $$

Where:

  • $c$ is the number of classes
  • $p_i$ is the probability of class $i$

By minimizing these impurity measures, Random Forest ensures that each split in the Decision Trees is as pure as possible, leading to accurate predictions.

E) Gradient Boost

Gradient Boost is another ensemble learning technique that focuses on building Decision Trees sequentially. Each new tree aims to correct the errors made by the previous trees in the ensemble. Gradient Boost minimizes a loss function by applying the gradient descent algorithm and sequentially fitting weak learners (shallow Decision Trees) to the negative gradients of the loss function.

The objective function that Gradient Boost attempts to minimize is the sum of the loss function and a regularization term:

Objective Function (Gradient Boost):

$$ \min_{F} \sum_{i=1}^{n} L(y_i, F(x_i)) + \sum_{m=1}^M \Omega(f_m) $$

Where:

  • $F(x_i)$ represents the ensemble's prediction for data point $x_i$
  • $L(y_i, F(x_i))$ is the loss function
  • $M$ is the number of weak learners (Decision Trees) in the ensemble
  • $\Omega(f_m)$ is a regularization term for the $m$-th weak learner

In general, gradient boost gives a more accurate prediction as its tree method fits the reality better (Ran, 2018). This differnece lies in the difference in method each model uses. Random Forests operate under the assumption that, on average, individual decision trees produce accurate predictions, with random noise being the primary factor hindering optimal performance. Consequently, ensembles are employed to mitigate the impact of this noise. In contrast, Gradient Boosting posits that a singular decision tree may habitually deviate from the desired target, necessitating the utilization of ensembles to systematically guide the tree towards the correct objective (Ran, 2018).

In this project, we are trying to answer two research questions:

    1. Do investors care about ESG? Are ESG significant in helping stock prediction?
    1. Are mahcine learning models incorporating ESG factors accurate? Or, in other words, how these models perform on stock prediction in terms of accuracy on the test dataset?
    1. Can machine learning models incorporating ESG factors be an investment strategy? Or, in other words, do machine learning models incorporating ESG factors perform well on stock portfolio construction?

VIII - Methodology¶

We retrieve the S&P500's each stock's quarterly E-score, S-score, G-score, Price Earning ratio, Price-to-book ratio, Relative Share Price Momentum, ROE, ROA, Net Income, Debt Ratio, Market Capitalization, Industry, and Last Price from Bloomberg from 2016-01 to 2022-10. We calculate quarterly returns, and remove outliers in factors, and because factors are not comparable across industry sectors (Naffa, 2022). Therefore, we follow MorningStar's approach via standardization (Justice and Hale, 2016).

$$ \text{z-score}_{\text{factor}_i} = \frac{\text{factor}_i - \mu_{\text{peer}}}{\sigma_{\text{peer}}} $$

Where:

  • $\text{z-score}_{\text{factor}_i}$: Standardized value for factor $i$
  • $\text{factor}_i$: Factor $i$
  • $\mu_{\text{peer}}$: Mean of Peer
  • $\sigma_{\text{peer}}$: Standard Deviation of Peer

We then neutralize the effects from industry (as we have shown in visualization part, industry is strongly associated with ESG factors, we want to avoid the effect of industry on our analysis) and marketcap followed by scholars (Naffa, 2022). In other words, marketcap and industry are kept as controlling variables.

Later, we use PCA analysis to make fundamental factors (Price Earning ratio, Price-to-book ratio, Relative Share Price Momentum, ROE, ROA, Net Income, Debt Ratio) into three groups (Profitability, Leverage, and Momentum) to reduce dimensions. And we will add a Covid dummy variable in to capture the effect of covid factor on stock returns.

The final factors will be 7 in total: E-score, S-score, G-score, PCA 1, PCA 2, PCA 3, Covid.

Lastly, we apply linear regression, support vector regression, random forest, and gradient boosting to construct stock portfolio based on factors. Our purpose is to see if ESG scores are significant in helping stock prediction, and how machine learning models incorporating ESG factors perform on stock portfolio construction which will be compared with S&P500 Index. To be more specific on stock portfolio construction, we will select top 5 stocks based on returns given by each machine learning model, and we will buy at the start of the first quarter and sell at the start of next quarter. These 5 stocks constitute the stock portfolio, the return of which will be compared with S&P500 to see if our portfolio beats S&P500. In this way, we can see whether the investment strategy of machine learning models incorporating ESG factor is good one to pursue.

IX - Data Retrieving, Data Cleaning & Data Manipulation¶

I will first retrieve the data from an excel file containing data from bloomberg. Then, I do some data cleaning and manipulation, just in accordance with methodology, to obtain the final dataset used for modelling.

In [ ]:
import pandas as pd
import numpy as np
pd.set_option('mode.chained_assignment', None)

data = pd.read_excel("./data.xlsx")
In [35]:
data.head(14)
Out[35]:
Instrument columns 2016-01-31 00:00:00 2016-04-30 00:00:00 2016-07-31 00:00:00 2016-10-31 00:00:00 2017-01-31 00:00:00 2017-04-30 00:00:00 2017-07-31 00:00:00 2017-10-31 00:00:00 ... 2020-07-31 00:00:00 2020-10-31 00:00:00 2021-01-31 00:00:00 2021-04-30 00:00:00 2021-07-31 00:00:00 2021-10-31 00:00:00 2022-01-31 00:00:00 2022-04-30 00:00:00 2022-07-31 00:00:00 2022-10-31 00:00:00
0 A UN Equity BESG Environmental Pillar Score NaN NaN NaN 1.5000 1.5000 1.5000 1.5000 1.5000 ... 1.5000 1.5000 1.5000 1.5000 1.5000 3.0000 3.0000 3.0000 3.0000 3.0000
1 NaN BESG Social Pillar Score NaN NaN NaN 1.6400 1.6400 1.6400 1.6400 1.6000 ... 1.8000 5.1400 5.1400 5.1400 5.1400 7.6600 7.6600 7.6600 7.6600 7.6600
2 NaN BESG Governance Pillar Score NaN NaN NaN 5.9200 5.9200 5.9200 5.9200 6.3500 ... 6.6800 6.5600 6.5600 6.5600 6.5600 6.7300 6.7300 6.7300 6.7300 6.7300
3 NaN Price Earnings Ratio (P/E) 24.9317 26.7484 30.7221 27.0396 27.9601 28.5429 31.6935 34.5991 ... 37.4695 38.6665 41.1288 41.5718 44.3922 42.2991 35.1326 28.7858 30.5493 29.4270
4 NaN Price to Book Ratio 3.0530 3.1953 3.5891 3.3271 3.6696 4.0419 4.1780 4.5344 ... 5.9677 6.4107 7.6294 8.4185 9.3871 8.8258 8.1094 6.9625 7.7968 7.6934
5 NaN Relative Share Price Momentum 4.0554 9.8799 8.5025 7.9585 -2.4319 6.5499 16.6943 14.3360 ... 9.2830 12.1779 12.6658 4.8849 4.6442 8.1828 -9.3599 -16.2913 0.4189 21.4459
6 NaN Return on Common Equity 11.0843 11.1298 11.2716 10.9869 12.2033 13.6396 14.1437 15.0760 ... 14.2064 14.9465 16.7841 19.3151 19.9456 23.5821 24.2016 25.4329 26.4621 23.4524
7 NaN Return on Assets 6.1939 6.1520 6.3530 6.0499 6.7088 7.4348 7.9150 8.4340 ... 7.6055 7.5371 8.4485 9.3185 9.8817 11.9024 12.0494 12.1134 12.6627 11.8096
8 NaN Net Income/Net Profit (Losses) 121.0000 91.0000 124.0000 126.0000 168.0000 164.0000 175.0000 177.0000 ... 199.0000 222.0000 288.0000 216.0000 264.0000 442.0000 283.0000 274.0000 329.0000 368.0000
9 NaN Revenue 1028.0000 1019.0000 1044.0000 1111.0000 1067.0000 1102.0000 1114.0000 1189.0000 ... 1261.0000 1483.0000 1548.0000 1525.0000 1586.0000 1660.0000 1674.0000 1607.0000 1718.0000 1849.0000
10 NaN Total Debt to Total Equity 42.8113 45.3541 43.4192 44.8422 46.3256 46.6971 45.1312 41.5926 ... 50.3513 52.0624 55.6828 64.9272 61.5447 54.0174 52.9686 56.7161 60.3614 55.0613
11 NaN Last Price 37.6500 40.9200 48.1100 43.5700 48.9700 55.0500 59.7900 68.0300 ... 96.3300 102.0900 120.1700 133.6400 153.2300 157.4900 139.3200 119.2700 134.1000 138.3500
12 NaN Current Market Cap 12340.4987 13412.3517 15660.8489 14133.4442 15755.9944 17742.6647 19212.7657 21893.9590 ... 29744.5266 31475.3306 36874.1075 40719.7759 46496.6054 47675.7930 42074.7510 35794.5225 40056.7503 40957.2129
13 AAL UW Equity BESG Environmental Pillar Score NaN NaN NaN 2.1200 2.1200 2.1200 2.1200 2.1800 ... 4.1800 2.8400 2.8400 2.8400 2.8400 5.3200 5.3200 5.3200 5.3200 5.3200

14 rows × 30 columns

In [36]:
data['Instrument'] = data['Instrument'].fillna(method='ffill')
data.head(3)
Out[36]:
Instrument columns 2016-01-31 00:00:00 2016-04-30 00:00:00 2016-07-31 00:00:00 2016-10-31 00:00:00 2017-01-31 00:00:00 2017-04-30 00:00:00 2017-07-31 00:00:00 2017-10-31 00:00:00 ... 2020-07-31 00:00:00 2020-10-31 00:00:00 2021-01-31 00:00:00 2021-04-30 00:00:00 2021-07-31 00:00:00 2021-10-31 00:00:00 2022-01-31 00:00:00 2022-04-30 00:00:00 2022-07-31 00:00:00 2022-10-31 00:00:00
0 A UN Equity BESG Environmental Pillar Score NaN NaN NaN 1.50 1.50 1.50 1.50 1.50 ... 1.50 1.50 1.50 1.50 1.50 3.00 3.00 3.00 3.00 3.00
1 A UN Equity BESG Social Pillar Score NaN NaN NaN 1.64 1.64 1.64 1.64 1.60 ... 1.80 5.14 5.14 5.14 5.14 7.66 7.66 7.66 7.66 7.66
2 A UN Equity BESG Governance Pillar Score NaN NaN NaN 5.92 5.92 5.92 5.92 6.35 ... 6.68 6.56 6.56 6.56 6.56 6.73 6.73 6.73 6.73 6.73

3 rows × 30 columns

In [37]:
# last column is nonsense, should be removed
data = data.drop(columns=data.columns[-1])
In [38]:
# Reshape the DataFrame using melt
data_melted = data.melt(id_vars=['Instrument', 'columns'], var_name='Date', value_name='Value')

# Pivot the DataFrame
data_melted['Instrument'] = data_melted['Instrument'].fillna(method='ffill')


data_pivoted = data_melted.pivot_table(index=['Instrument', 'Date'], columns='columns', values='Value')

# Reset the index to have 'Instrument' and 'Timestamp' as columns again
data_pivoted.reset_index(inplace=True)

data_pivoted
Out[38]:
columns Instrument Date BESG Environmental Pillar Score BESG Governance Pillar Score BESG Social Pillar Score Current Market Cap Last Price Net Income/Net Profit (Losses) Price Earnings Ratio (P/E) Price to Book Ratio Relative Share Price Momentum Return on Assets Return on Common Equity Revenue Total Debt to Total Equity
0 A UN Equity 2016-01-31 NaN NaN NaN 12340.4987 37.65 121.0 24.9317 3.0530 4.0554 6.1939 11.0843 1028.0 42.8113
1 A UN Equity 2016-04-30 NaN NaN NaN 13412.3517 40.92 91.0 26.7484 3.1953 9.8799 6.1520 11.1298 1019.0 45.3541
2 A UN Equity 2016-07-31 NaN NaN NaN 15660.8489 48.11 124.0 30.7221 3.5891 8.5025 6.3530 11.2716 1044.0 43.4192
3 A UN Equity 2016-10-31 1.50 5.92 1.64 14133.4442 43.57 126.0 27.0396 3.3271 7.9585 6.0499 10.9869 1111.0 44.8422
4 A UN Equity 2017-01-31 1.50 5.92 1.64 15755.9944 48.97 168.0 27.9601 3.6696 -2.4319 6.7088 12.2033 1067.0 46.3256
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
13427 ZTS UN Equity 2021-07-31 4.59 7.25 3.47 92011.5256 194.14 552.0 42.1919 19.6411 16.4668 14.4513 47.8686 1990.0 144.2427
13428 ZTS UN Equity 2021-10-31 4.59 7.19 3.47 115456.8943 244.03 414.0 52.1302 25.3846 15.3516 14.8097 49.0135 1967.0 149.2958
13429 ZTS UN Equity 2022-01-31 4.59 7.19 3.47 88976.7529 188.59 595.0 40.1436 19.0780 -4.4566 14.9913 47.3991 1986.0 144.6114
13430 ZTS UN Equity 2022-04-30 4.59 7.19 3.47 80896.3595 171.89 529.0 36.5053 17.5873 -14.7083 15.0149 46.8085 2052.0 147.0306
13431 ZTS UN Equity 2022-07-31 4.59 7.19 3.47 69420.2682 148.29 529.0 31.7942 14.8335 -6.0819 15.0992 44.2518 2002.0 144.7662

13432 rows × 15 columns

Now, because industry column is missing, I will merge another dataset containing the name of instrument and its industry to data_pivot dataset.

In [39]:
#to match instrument token
data_pivoted['Instrument']=data_pivoted['Instrument'].replace(' UN Equity', '', regex=True)

data_pivoted.head()
Out[39]:
columns Instrument Date BESG Environmental Pillar Score BESG Governance Pillar Score BESG Social Pillar Score Current Market Cap Last Price Net Income/Net Profit (Losses) Price Earnings Ratio (P/E) Price to Book Ratio Relative Share Price Momentum Return on Assets Return on Common Equity Revenue Total Debt to Total Equity
0 A 2016-01-31 NaN NaN NaN 12340.4987 37.65 121.0 24.9317 3.0530 4.0554 6.1939 11.0843 1028.0 42.8113
1 A 2016-04-30 NaN NaN NaN 13412.3517 40.92 91.0 26.7484 3.1953 9.8799 6.1520 11.1298 1019.0 45.3541
2 A 2016-07-31 NaN NaN NaN 15660.8489 48.11 124.0 30.7221 3.5891 8.5025 6.3530 11.2716 1044.0 43.4192
3 A 2016-10-31 1.5 5.92 1.64 14133.4442 43.57 126.0 27.0396 3.3271 7.9585 6.0499 10.9869 1111.0 44.8422
4 A 2017-01-31 1.5 5.92 1.64 15755.9944 48.97 168.0 27.9601 3.6696 -2.4319 6.7088 12.2033 1067.0 46.3256
In [43]:
industry = pd.read_csv('/Users/daltondu/Documents/UBC/323/project/data/industry.csv')
In [44]:
merged_df = data_pivoted.merge(industry, on='Instrument')

merged_df.head(3)
Out[44]:
Instrument Date BESG Environmental Pillar Score BESG Governance Pillar Score BESG Social Pillar Score Current Market Cap Last Price Net Income/Net Profit (Losses) Price Earnings Ratio (P/E) Price to Book Ratio Relative Share Price Momentum Return on Assets Return on Common Equity Revenue Total Debt to Total Equity Industry
0 A 2016-01-31 NaN NaN NaN 12340.4987 37.65 121.0 24.9317 3.0530 4.0554 6.1939 11.0843 1028.0 42.8113 Health Care
1 A 2016-04-30 NaN NaN NaN 13412.3517 40.92 91.0 26.7484 3.1953 9.8799 6.1520 11.1298 1019.0 45.3541 Health Care
2 A 2016-07-31 NaN NaN NaN 15660.8489 48.11 124.0 30.7221 3.5891 8.5025 6.3530 11.2716 1044.0 43.4192 Health Care
In [45]:
merged_df['Date'] = pd.to_datetime(merged_df['Date'])

# Format the 'Date' column as year-month
merged_df['Date'] = merged_df['Date'].dt.strftime('%Y-%m')

merged_df.head(1)
Out[45]:
Instrument Date BESG Environmental Pillar Score BESG Governance Pillar Score BESG Social Pillar Score Current Market Cap Last Price Net Income/Net Profit (Losses) Price Earnings Ratio (P/E) Price to Book Ratio Relative Share Price Momentum Return on Assets Return on Common Equity Revenue Total Debt to Total Equity Industry
0 A 2016-01 NaN NaN NaN 12340.4987 37.65 121.0 24.9317 3.053 4.0554 6.1939 11.0843 1028.0 42.8113 Health Care

Here we have gotten the rough shape of the dataframe we want. Next, we will fill in missing data in all columns It is important to notice that we should only fill in missing value by a previous or next year's value when it is under the same instrument. We accomplish this by do the following:

In [46]:
#see how many is missing
pd.isna(merged_df).sum()
Out[46]:
Instrument                            0
Date                                  0
BESG Environmental Pillar Score    1227
BESG Governance Pillar Score        744
BESG Social Pillar Score           1227
Current Market Cap                   24
Last Price                           22
Net Income/Net Profit (Losses)        1
Price Earnings Ratio (P/E)           63
Price to Book Ratio                 199
Relative Share Price Momentum        34
Return on Assets                     14
Return on Common Equity             249
Revenue                               1
Total Debt to Total Equity          170
Industry                              0
dtype: int64
In [48]:
#fill up missing cells

fill_df = pd.DataFrame()
for instrument in merged_df.Instrument.unique():

    #to make sure only the number is filled up within same instrument: the function will not fill up values across different instruments
    s_df = merged_df[merged_df['Instrument'] == instrument]
    s_df = s_df.fillna(method = "ffill").fillna(method="bfill")
    fill_df = pd.concat([fill_df, s_df])

fill_df.head()
Out[48]:
Instrument Date BESG Environmental Pillar Score BESG Governance Pillar Score BESG Social Pillar Score Current Market Cap Last Price Net Income/Net Profit (Losses) Price Earnings Ratio (P/E) Price to Book Ratio Relative Share Price Momentum Return on Assets Return on Common Equity Revenue Total Debt to Total Equity Industry
0 A 2016-01 1.5 5.92 1.64 12340.4987 37.65 121.0 24.9317 3.0530 4.0554 6.1939 11.0843 1028.0 42.8113 Health Care
1 A 2016-04 1.5 5.92 1.64 13412.3517 40.92 91.0 26.7484 3.1953 9.8799 6.1520 11.1298 1019.0 45.3541 Health Care
2 A 2016-07 1.5 5.92 1.64 15660.8489 48.11 124.0 30.7221 3.5891 8.5025 6.3530 11.2716 1044.0 43.4192 Health Care
3 A 2016-10 1.5 5.92 1.64 14133.4442 43.57 126.0 27.0396 3.3271 7.9585 6.0499 10.9869 1111.0 44.8422 Health Care
4 A 2017-01 1.5 5.92 1.64 15755.9944 48.97 168.0 27.9601 3.6696 -2.4319 6.7088 12.2033 1067.0 46.3256 Health Care
In [50]:
fill_df = fill_df.fillna(0)
In [51]:
fill_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7717 entries, 0 to 7716
Data columns (total 16 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   Instrument                       7717 non-null   object 
 1   Date                             7717 non-null   object 
 2   BESG Environmental Pillar Score  7717 non-null   float64
 3   BESG Governance Pillar Score     7717 non-null   float64
 4   BESG Social Pillar Score         7717 non-null   float64
 5   Current Market Cap               7717 non-null   float64
 6   Last Price                       7717 non-null   float64
 7   Net Income/Net Profit (Losses)   7717 non-null   float64
 8   Price Earnings Ratio (P/E)       7717 non-null   float64
 9   Price to Book Ratio              7717 non-null   float64
 10  Relative Share Price Momentum    7717 non-null   float64
 11  Return on Assets                 7717 non-null   float64
 12  Return on Common Equity          7717 non-null   float64
 13  Revenue                          7717 non-null   float64
 14  Total Debt to Total Equity       7717 non-null   float64
 15  Industry                         7717 non-null   object 
dtypes: float64(13), object(3)
memory usage: 1.0+ MB
In [61]:
#change last price to quarterly return

fill_df.sort_values(by='Date', inplace=True)

def calculate_quarterly_return(group):
    group['Quarterly Return'] = group['Last Price'].pct_change(periods=1)
    return group

# Group by 'Instrument' and apply the custom function to each group
fill_df = fill_df.groupby('Instrument').apply(calculate_quarterly_return)

fill_df.head()
/var/folders/xz/y00l9pw56txcfllknld9w1rr0000gn/T/ipykernel_81675/1960402510.py:10: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

	>>> .groupby(..., group_keys=True)
  fill_df = fill_df.groupby('Instrument').apply(calculate_quarterly_return)
Out[61]:
Instrument Date BESG Environmental Pillar Score BESG Governance Pillar Score BESG Social Pillar Score Current Market Cap Last Price Net Income/Net Profit (Losses) Price Earnings Ratio (P/E) Price to Book Ratio Relative Share Price Momentum Return on Assets Return on Common Equity Revenue Total Debt to Total Equity Industry Quarterly Return
0 A 2016-01 1.50 5.92 1.64 12340.4987 37.65 121.000 24.9317 3.0530 4.0554 6.1939 11.0843 1028.000 42.8113 Health Care NaN
6313 SHW 2016-01 3.29 7.00 0.69 26247.3745 94.89 164.876 23.1355 26.3089 12.1670 18.2879 147.8293 2574.024 203.7951 Materials NaN
3343 HIG 2016-01 0.06 7.80 2.07 18278.8242 46.08 323.000 11.8106 1.0065 -4.9065 0.6483 8.2713 4391.000 29.3894 Financials NaN
2777 FCX 2016-01 5.22 7.31 3.95 12944.1275 10.34 -4184.000 9.4625 3.5064 -9.5850 -28.2511 -143.1460 3242.000 237.8047 Materials NaN
216 AES 2016-01 2.24 8.23 5.02 7784.8534 11.80 126.000 11.2824 2.4293 1.6587 0.7744 8.0299 3271.000 299.8396 Utilities NaN
In [63]:
#drop NAN ones as they are first quarter, also drop last price

fill_df = fill_df.drop(columns='Last Price')
fill_df = fill_df.dropna()
In [64]:
fill_df.head()
Out[64]:
Instrument Date BESG Environmental Pillar Score BESG Governance Pillar Score BESG Social Pillar Score Current Market Cap Net Income/Net Profit (Losses) Price Earnings Ratio (P/E) Price to Book Ratio Relative Share Price Momentum Return on Assets Return on Common Equity Revenue Total Debt to Total Equity Industry Quarterly Return
6071 RHI 2016-04 0.00 6.67 1.71 5011.1179 91.616 13.6774 4.6433 -18.3080 20.8316 35.0449 1344.160 0.1014 Industrials -0.180764
6476 SPGI 2016-04 4.69 8.08 1.94 28402.4467 383.000 23.8403 75.2817 9.7340 16.4569 206.6725 1482.000 280.9665 Financials 0.083653
5369 ORCL 2016-04 4.68 6.04 0.83 166824.5824 2814.000 18.6499 3.5117 2.0127 7.9791 18.5509 10594.000 91.7661 Information Technology 0.092985
4343 LOW 2016-04 1.63 6.94 1.68 68223.3128 884.000 21.7946 9.4234 2.2649 7.7012 32.9439 15234.000 213.6023 Consumer Discretionary 0.060843
4451 MAA 2016-04 0.00 7.07 1.50 8034.2374 45.144 48.5020 2.7187 10.5362 3.2565 7.5275 272.236 111.4943 Real Estate 0.040994
In [65]:
fill_df.to_csv('./final.csv')

Clean and Rename Columns, Remove Extreme Value & Standardizing Data & Neutrailization¶

In [26]:
import numpy as np
import pandas as pd

data = pd.read_csv('./final.csv')
In [27]:
data.head(3)
Out[27]:
Unnamed: 0 Instrument Date BESG Environmental Pillar Score BESG Governance Pillar Score BESG Social Pillar Score Current Market Cap Net Income/Net Profit (Losses) Price Earnings Ratio (P/E) Price to Book Ratio Relative Share Price Momentum Return on Assets Return on Common Equity Revenue Total Debt to Total Equity Industry Quarterly Return
0 6071 RHI 2016-04 0.00 6.67 1.71 5011.1179 91.616 13.6774 4.6433 -18.3080 20.8316 35.0449 1344.16 0.1014 Industrials -0.180764
1 6476 SPGI 2016-04 4.69 8.08 1.94 28402.4467 383.000 23.8403 75.2817 9.7340 16.4569 206.6725 1482.00 280.9665 Financials 0.083653
2 5369 ORCL 2016-04 4.68 6.04 0.83 166824.5824 2814.000 18.6499 3.5117 2.0127 7.9791 18.5509 10594.00 91.7661 Information Technology 0.092985
In [28]:
data = data.drop(data.columns[0], axis=1)
data.head(3)
Out[28]:
Instrument Date BESG Environmental Pillar Score BESG Governance Pillar Score BESG Social Pillar Score Current Market Cap Net Income/Net Profit (Losses) Price Earnings Ratio (P/E) Price to Book Ratio Relative Share Price Momentum Return on Assets Return on Common Equity Revenue Total Debt to Total Equity Industry Quarterly Return
0 RHI 2016-04 0.00 6.67 1.71 5011.1179 91.616 13.6774 4.6433 -18.3080 20.8316 35.0449 1344.16 0.1014 Industrials -0.180764
1 SPGI 2016-04 4.69 8.08 1.94 28402.4467 383.000 23.8403 75.2817 9.7340 16.4569 206.6725 1482.00 280.9665 Financials 0.083653
2 ORCL 2016-04 4.68 6.04 0.83 166824.5824 2814.000 18.6499 3.5117 2.0127 7.9791 18.5509 10594.00 91.7661 Information Technology 0.092985
In [29]:
#change the name of the columns.

data.rename(columns={'Instrument': 'Instrument', 'Date': 'Date', 'BESG Environmental Pillar Score' : 'E Score', 'BESG Social Pillar Score' : 'S Score', 'BESG Governance Pillar Score':'G Score', 'Net Income/Net Profit (Losses)' : 'NI', 
                          'Price Earnings Ratio (P/E)': 'PE', 'Price to Book Ratio': 'PB', 'Relative Share Price Momentum': 'Momentum', 'Return on Assets': 'ROA', 
                          'Return on Common Equity': 'ROE', 'Total Debt to Total Equity': 'Debt Ratio', 'Industry': 'Industry'}, inplace = True)
data.tail()
Out[29]:
Instrument Date E Score G Score S Score Current Market Cap NI PE PB Momentum ROA ROE Revenue Debt Ratio Industry Quarterly Return
7426 DHR 2022-07 0.00 6.84 3.49 187891.8607 1572.000 28.4515 4.1488 11.7926 8.2738 15.2472 7663.000 43.0851 Health Care 0.000930
7427 ACN 2022-07 6.42 7.83 2.11 191868.5670 1665.128 26.6385 8.2426 -0.2166 15.2084 33.0351 15423.656 14.6206 Information Technology -0.033505
7428 SJM 2022-07 4.59 7.78 2.76 14099.7891 109.800 19.0025 1.7313 8.7050 3.6173 7.1923 1873.000 59.0744 Consumer Staples -0.033667
7429 CFG 2022-07 0.49 6.85 4.53 17030.3073 636.000 7.2331 0.8076 -13.2617 0.9473 8.6437 2481.000 71.4983 Financials -0.037265
7430 ZTS 2022-07 4.59 7.19 3.47 69420.2682 529.000 31.7942 14.8335 -6.0819 15.0992 44.2518 2002.000 144.7662 Health Care -0.137297

Standardizing¶

The standardizing process involves transforming the data to have a mean of 0 and a standard deviation of 1. The formula for standardizing a variable $x$ is as follows:

$$ z = \frac{x - \mu}{\sigma} $$

where:

  • $z$ is the standardized value,
  • $x$ is the original value,
  • $\mu$ is the mean of the dataset, and
  • $\sigma$ is the standard deviation of the dataset.

Neutralizing¶

Neutralizing is a process used to remove the impact of unwanted factors (e.g., industry and marketcap) from a set of variables. This is often done using a multiple linear regression, where the variable of interest (e.g., a factor) is regressed on the unwanted factors (e.g., industry dummies). The residuals from this regression represent the neutralized variable.

Given a dependent variable $y$ (the factor to be neutralized) and a set of independent variables $X$ (the unwanted factors, e.g., industry dummies), the linear regression model can be represented as:

$$ y = X\beta + \epsilon $$

where:

  • $\beta$ is a vector of coefficients, and
  • $\epsilon$ is the residual (neutralized) component.

By fitting this linear regression, we obtain the residual $\epsilon$, which represents the neutralized factor.

In [552]:
norm_df = pd.DataFrame()
norm_df['Date'] = data['Date']
norm_df['Instrument'] = data['Instrument']
norm_df['Quarterly Return'] = data['Quarterly Return']

def removeoutlier_standardizing_neutralizing(data, factor_column = factor_column, dummy_industry=dummy_industry, marketcap=marketcap, std=3):
    factor = ['E Score', 'S Score', 'G Score', 'NI', 'PE', 'PB', 'Momentum', 'ROA', 'ROE', 'Revenue', 'Debt Ratio', 'Industry']
    a = data[factor]
    a = a.dropna().copy()

    # Apply outlier removal to columns
    for col in factor_column:
        edge_up = a.groupby('Industry')[col].transform(lambda x: x.mean() + std * x.std())
        edge_low = a.groupby('Industry')[col].transform(lambda x: x.mean() - std * x.std())
        a.loc[a[col] > edge_up, col] = edge_up
        a.loc[a[col] < edge_low, col] = edge_low

    # Apply group-wise standardization using transform()
    a[factor_column] = a.groupby('Industry')[factor_column].transform(lambda x: (x - x.mean()) / x.std())

    def neutralizing(factor, dummy_industry, marketcap):

        '''
        This function helps to isolate the impact of individual factors on the returns of a portfolio, thereby reducing the impact of biases or unintended exposures to other factors. 
        It ensures that the model is focused on the factors of interest and not driven by hidden or unwanted exposures, and in this case, the unwanted exposure is the industry.
        '''
        y = factor

        LNmarketcap = marketcap.apply(lambda x:math.log(x))

        x= pd.concat([LNmarketcap, dummy_industry], axis=1)

        result = sm.OLS(y.astype(float), x.astype(float)).fit()
        return result.resid

    #neutralize
    a[factor_column] = neutralizing(a[factor_column], dummy_industry, marketcap)

    a = a.drop('Industry', axis=1)
    return a

norm_df = norm_df.merge(removeoutlier_standardizing_neutralizing(data), left_index=True, right_index=True)
In [409]:
norm_df
Out[409]:
Date Instrument Quarterly Return E Score S Score G Score NI PE PB Momentum ROA ROE Revenue Debt Ratio
0 2016-04 RHI -0.180764 -0.794609 -0.417906 -0.682521 0.646252 -0.238948 0.039841 -1.142926 2.899639 0.345103 0.342765 -0.242635
1 2016-04 SPGI 0.083653 2.124629 -0.078859 1.207612 -0.310351 0.904303 2.276875 0.638833 3.835884 3.231890 -0.637843 1.226947
2 2016-04 ORCL 0.092985 0.085032 -1.244272 -2.352521 1.116564 -0.402256 -0.546795 -0.066309 -0.226945 -0.478035 0.203074 -0.280510
3 2016-04 LOW 0.060843 -0.635830 -0.462489 -0.462001 -0.198123 -0.109821 -0.388520 0.086307 -0.235228 -0.303302 0.238386 -0.327309
4 2016-04 MAA 0.040994 -1.278139 -0.720425 0.298226 -0.158611 -0.109704 -0.336812 1.045642 -0.116065 -0.194657 -0.352459 -0.208183
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7426 2022-07 DHR 0.000930 -1.569813 0.443858 -0.681898 -0.304783 0.296278 -0.459524 0.585317 -0.094883 -0.627106 -1.138502 -0.434464
7427 2022-07 ACN -0.033505 0.755874 0.251713 1.085647 -0.075711 -0.308817 -0.249107 -0.251594 0.790317 -0.078159 1.096902 -0.664381
7428 2022-07 SJM -0.033667 0.727459 0.150126 0.930223 0.078841 -0.151280 -0.219080 0.845005 -0.790318 -0.176290 0.157363 -0.194284
7429 2022-07 CFG -0.037265 -0.139193 1.445985 -0.677444 0.179862 -0.834155 -0.081724 -0.820258 -0.393494 -0.083843 -0.178153 -0.286319
7430 2022-07 ZTS -0.137297 0.620567 0.572084 -0.003925 -0.494337 0.661950 -0.068215 -0.638368 1.321123 0.299901 -0.879869 -0.212003

7431 rows × 14 columns

Also, we should find next quarter's return as our dependent variable for our modelling.

In [553]:
#find next quarter return
final = pd.DataFrame()
for instr in norm_df.Instrument.unique():
    s = norm_df[norm_df['Instrument'] == instr]
    s['Next Q Return'] = s['Quarterly Return'].shift(-1)
    final = pd.concat([final, s])

final.head()
Out[553]:
Date Instrument Quarterly Return E Score S Score G Score NI PE PB Momentum ROA ROE Revenue Debt Ratio Next Q Return
0 2016-04 RHI -0.180764 -0.794609 -0.417906 -0.682521 0.646252 -0.238948 0.039841 -1.142926 2.899639 0.345103 0.342765 -0.242635 -0.007862
562 2016-07 RHI -0.007862 -0.788871 -0.415521 -0.679219 0.655998 -0.238228 0.040337 -1.159616 2.742283 0.340419 0.352340 -0.241018 0.288431
840 2016-10 RHI 0.288431 -0.871597 -0.449900 -0.726823 0.478711 -0.119151 0.017910 1.211123 2.674064 0.264757 0.187422 -0.264439 0.001025
996 2017-01 RHI 0.001025 -0.869063 -0.448847 -0.725365 0.484592 -0.114168 0.019109 1.291428 2.541265 0.256264 0.195962 -0.263726 -0.018431
1199 2017-04 RHI -0.018431 -0.861106 -0.445540 -0.720786 0.502390 -0.108408 0.019690 -0.556969 2.318636 0.244288 0.214258 -0.261481 0.050282
In [533]:
final = final.reset_index(drop=True).dropna()
In [412]:
final
Out[412]:
Date Instrument Quarterly Return E Score S Score G Score NI PE PB Momentum ROA ROE Revenue Debt Ratio Next Q Return
0 2016-04 RHI -0.180764 -0.794609 -0.417906 -0.682521 0.646252 -0.238948 0.039841 -1.142926 2.899639 0.345103 0.342765 -0.242635 -0.007862
1 2016-07 RHI -0.007862 -0.788871 -0.415521 -0.679219 0.655998 -0.238228 0.040337 -1.159616 2.742283 0.340419 0.352340 -0.241018 0.288431
2 2016-10 RHI 0.288431 -0.871597 -0.449900 -0.726823 0.478711 -0.119151 0.017910 1.211123 2.674064 0.264757 0.187422 -0.264439 0.001025
3 2017-01 RHI 0.001025 -0.869063 -0.448847 -0.725365 0.484592 -0.114168 0.019109 1.291428 2.541265 0.256264 0.195962 -0.263726 -0.018431
4 2017-04 RHI -0.018431 -0.861106 -0.445540 -0.720786 0.502390 -0.108408 0.019690 -0.556969 2.318636 0.244288 0.214258 -0.261481 0.050282
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7425 2021-04 DXC 0.245681 1.439174 -0.277825 0.212136 0.545462 0.210870 -0.231891 3.262569 -0.633070 -0.109188 0.653665 0.113160 -0.136877
7426 2021-07 DXC -0.136877 1.493021 -0.255447 0.243122 0.203621 -0.074785 -0.219816 0.762283 -0.554010 -0.020808 0.729533 0.166225 -0.042249
7427 2021-10 DXC -0.042249 1.507222 -0.249545 0.251294 0.505048 -0.073831 -0.217727 -1.989807 -1.196390 -0.617624 0.768243 0.148156 0.013669
7428 2022-01 DXC 0.013669 1.513221 -0.247052 0.254746 0.930386 -0.106554 -0.227228 -0.535099 -0.296670 0.169376 0.762821 0.121962 -0.071100
7429 2022-04 DXC -0.071100 1.559481 -0.227828 0.281367 0.610223 -0.266300 -0.211008 1.944347 -0.365724 0.105921 0.786748 0.141913 -0.192346

7145 rows × 15 columns

In [42]:
final.to_csv('./final.csv')

PCA¶

For our research, E score, S score, and G Score are the main factors that we want to interpret. For other factors, in order for reduced complexity to remove redundancy information, we will use PCA to reduce these factors into 3. Just according to fundamental analysis, we can see divide factors into three groups: one for profitability (NI, PE, PB, ROA, ROE), one for Momentum (Momentum), and one for Liability (Debt Ratio). This is why we choose n=3 for PCA.

In [638]:
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['axes.unicode_minus'] = False

from sklearn.decomposition import PCA

pca_factor_name = ['NI', 'PE', 'PB', 'Momentum', 'ROA', 'ROE', 'Revenue', 'Debt Ratio']

com_df = pd.DataFrame()

date_list = list(final['Date'].unique())

# PCA analysis by using four quarters (1 year) past factors to predict this quarter's return
for quar in range(4, len(final['Date'].unique())):
    s_df = final[(final['Date'] >= date_list[quar - 4]) & (final['Date'] < date_list[quar])]

    pca = PCA(n_components=3)

    pca.fit(np.array(s_df[pca_factor_name].values))
    weights = pca.components_

    curr = final[final['Date'] == date_list[quar]]

    split_com = pd.DataFrame()
    split_com['Date'] = curr['Date']
    split_com['Instrument'] = curr['Instrument']

    # Calculate PCA factors
    pca_factors = np.dot(curr[pca_factor_name].values, weights.T)
    for i in range(3):
        split_com[f'PCA Factor {i + 1}'] = pca_factors[:, i]

    com_df = com_df.append(split_com)

com_df['Next Q Return'] = final['Next Q Return']
In [639]:
com_df['Quarterly Return'] = final['Quarterly Return']
com_df['E Score'] = final['E Score']
com_df['S Score'] = final['S Score']
com_df['G Score'] = final['G Score']

com_df
Out[639]:
Date Instrument PCA Factor 1 PCA Factor 2 PCA Factor 3 Next Q Return Quarterly Return E Score S Score G Score
1199 2017-04 RHI 0.852194 -1.940168 0.434228 0.050282 -0.018431 -0.861106 -0.445540 -0.720786
1297 2017-04 SPGI 4.612328 -2.510695 3.296044 0.070690 0.116644 2.029091 -0.118562 1.152635
1208 2017-04 ORCL -0.743416 -0.238669 -0.603372 0.108835 0.065743 0.121018 -1.189149 -2.045449
1413 2017-04 LOW -0.560841 0.274964 -0.065589 -0.088721 0.173421 -0.594675 -0.473439 -0.360427
1389 2017-04 MAA -1.021647 0.370841 0.204787 0.014234 0.035777 -1.413413 -0.776643 0.220382
... ... ... ... ... ... ... ... ... ... ...
7262 2022-07 WM -0.452703 -0.290607 1.159184 NaN 0.047261 0.499012 0.381293 1.670612
7252 2022-07 CAH 0.433156 -1.607065 -1.295089 NaN 0.275684 1.153382 -0.063716 0.859551
7407 2022-07 AVY 0.548646 -0.349065 0.162362 NaN -0.060558 0.994782 -0.558506 -0.350338
7249 2022-07 CMS 1.001439 -0.688955 0.923604 NaN -0.137185 0.305028 0.946862 2.218818
7368 2022-07 DXC 0.106735 -0.544495 -0.999375 NaN -0.192346 1.631655 -0.197834 0.322900

6291 rows × 10 columns

In [185]:
com_df.to_csv('./com_df.csv')
In [646]:
import seaborn as sns
import matplotlib.pyplot as plt

df = com_df.drop(com_df.columns[6], axis =1)
df = df.dropna()
df
Out[646]:
Date Instrument PCA Factor 1 PCA Factor 2 PCA Factor 3 Next Q Return E Score S Score G Score
1199 2017-04 RHI 0.852194 -1.940168 0.434228 0.050282 -0.861106 -0.445540 -0.720786
1297 2017-04 SPGI 4.612328 -2.510695 3.296044 0.070690 2.029091 -0.118562 1.152635
1208 2017-04 ORCL -0.743416 -0.238669 -0.603372 0.108835 0.121018 -1.189149 -2.045449
1413 2017-04 LOW -0.560841 0.274964 -0.065589 -0.088721 -0.594675 -0.473439 -0.360427
1389 2017-04 MAA -1.021647 0.370841 0.204787 0.014234 -1.413413 -0.776643 0.220382
... ... ... ... ... ... ... ... ... ...
7107 2022-04 WM -0.530063 0.432105 -0.444899 0.047261 0.513191 0.387185 1.678771
6970 2022-04 CAH 0.382502 -1.287591 2.399141 0.275684 1.222790 -0.034871 0.899493
6998 2022-04 AVY 0.463544 -0.144213 -0.221545 -0.060558 0.970929 -0.568420 -0.364065
7078 2022-04 CMS 1.250524 -0.300628 -1.373276 -0.137185 0.255026 0.926082 2.190044
6889 2022-04 DXC 0.073055 -0.924805 0.293358 -0.192346 1.559481 -0.227828 0.281367

6005 rows × 9 columns

Add covid 19 into consideration

In [647]:
# Create a dummy variable for Covid based on the condition: if date >= '2020-01-01' and <= '2022-01-01', it would be 1, otherwise 0
# we assume covid affected stock market in 2020 and 2021. 
# Why setting 2022 to be uneffective is because 1. a single event which has a start must have a end, and 2. we have to do this because we build the test data set
# only for data with date >2021-01. If I set all date > 2021-01 with covid variable 1, the linear regression will show error because intercept will be confused
# with dummy variable
df['Covid'] = ((df['Date'] >= '2020-01') & (df['Date'] < '2022-01')).astype(int)

X. Final Dataset¶

The final dataset is consisted of factors of 'E Score', 'S Score', 'G Score', 'PCA Factor 1', 'PCA Factor 2', 'PCA Factor 3', 'Covid'.

In [758]:
import seaborn as sns
import matplotlib.pyplot as plt

factor_col = ['E Score', 'S Score', 'G Score', 'PCA Factor 1', 'PCA Factor 2', 'PCA Factor 3', 'Covid']

# Assuming your DataFrame is named df
correlation_matrix = df[factor_col].corr()

# Create a heatmap using Seaborn
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, fmt=".2f", linewidths=0.5)

# Customize the plot
plt.title("Correlation Matrix", fontsize=20)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

# Show the plot
plt.show()
In [762]:
df.describe()
Out[762]:
PCA Factor 1 PCA Factor 2 PCA Factor 3 Next Q Return E Score S Score G Score Covid
count 6005.000000 6005.000000 6005.000000 6005.000000 6005.000000 6005.000000 6005.000000 6005.000000
mean -0.003506 0.029440 -0.014076 0.024194 0.053216 0.050056 0.015121 0.381016
std 1.494748 1.131171 1.034577 0.158069 0.950474 1.000145 0.979416 0.485677
min -5.313446 -5.422237 -4.328957 -0.812361 -2.681774 -3.031672 -3.047624 0.000000
25% -0.703778 -0.607372 -0.612219 -0.060558 -0.623677 -0.680531 -0.575794 0.000000
50% -0.339823 -0.070666 -0.077146 0.025691 -0.016866 -0.134695 0.109163 0.000000
75% 0.146190 0.509475 0.462595 0.109748 0.654640 0.669262 0.697301 1.000000
max 16.026433 9.816880 8.207443 0.957227 3.337515 3.264825 2.825667 1.000000

XI. Linear Regression, Random Forest, SVR, and Gradient Boost¶

Here, we first divide dataset into a training dataset (data before 2021-01) and a test dataset (data after 2021-01).

In [198]:
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
In [648]:
train_df = df[df['Date']<='2021-01']
test_df = df[df['Date']>'2021-01']
In [650]:
#factor_col = ['E Score', 'S Score', 'G Score', 'NI', 'PE', 'PB', 'Momentum', 'ROA', 'ROE', 'Revenue', 'Debt Ratio', 'Covid']


X_train = train_df[factor_col].values
Y_train = train_df['Next Q Return']

X_test = test_df[factor_col].values
Y_test = test_df['Next Q Return']

Linear Regression

In [651]:
X_train1 = sm.add_constant(X_train, prepend = True)
LR1 = sm.OLS(Y_train, X_train1)
LR = LR1.fit()

X_test1 = sm.add_constant(X_test, prepend = True)
LR_test = LR.predict(X_test1)

Random Forest

In [653]:
randomforest_regressor = RandomForestRegressor(n_estimators=300)

model_RF = randomforest_regressor.fit(X_train, Y_train)
RF_test = model_RF.predict(X_test)

SVR

In [655]:
svr = SVR(kernel='rbf')
model_SVR = svr.fit(X_train, Y_train)

svr_test = model_SVR.predict(X_test)

Gradient Boost

In [720]:
gbdt = GradientBoostingRegressor(n_estimators=100, max_depth=5)
model_GB = gbdt.fit(X_train, Y_train)

gbdt_test = model_GB.predict(X_test)

XII. Result¶

Q2: Are ESG Scores Significant in Stock Prediction?¶

Below is the result of our linear regression model, variables X1 to X7 each represent: 'E Score', 'S Score', 'G Score', 'PCA Factor 1', 'PCA Factor 2', 'PCA Factor 3', 'Covid'.

In linear regression model, we can see among ESG scores, only S-score is shown to be significant.

In [737]:
LR.summary()
Out[737]:
OLS Regression Results
Dep. Variable: Next Q Return R-squared: 0.112
Model: OLS Adj. R-squared: 0.111
Method: Least Squares F-statistic: 82.20
Date: Sun, 23 Apr 2023 Prob (F-statistic): 6.56e-113
Time: 12:40:27 Log-Likelihood: 2111.9
No. Observations: 4576 AIC: -4208.
Df Residuals: 4568 BIC: -4156.
Df Model: 7
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const -0.0004 0.003 -0.136 0.892 -0.006 0.005
x1 0.0001 0.003 0.050 0.960 -0.005 0.005
x2 -0.0069 0.003 -2.663 0.008 -0.012 -0.002
x3 0.0001 0.002 0.043 0.966 -0.005 0.005
x4 0.0009 0.002 0.598 0.550 -0.002 0.004
x5 0.0036 0.002 1.737 0.083 -0.000 0.008
x6 -0.0014 0.002 -0.623 0.533 -0.006 0.003
x7 0.1174 0.005 23.012 0.000 0.107 0.127
Omnibus: 428.428 Durbin-Watson: 1.472
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1585.733
Skew: -0.425 Prob(JB): 0.00
Kurtosis: 5.756 Cond. No. 3.49


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

As for random forest:

All E-score, S-score, and G-scores are slightly above the average significance which is 1/7.

In [738]:
# Get feature importances
RF_importances = model_RF.feature_importances_

# Print the feature importances
for feature, importance in zip(factor_col, RF_importances):
    print(f"{feature}: {importance}")
E Score: 0.15306876895814198
S Score: 0.1456452103484826
G Score: 0.14323780823727802
PCA Factor 1: 0.13957025117922972
PCA Factor 2: 0.15753055992669504
PCA Factor 3: 0.1513779006058136
Covid: 0.10956950074435894

As for Gradient Boosting:

Only E-score is slightly above the average significance.

In [739]:
importances2 = model_GB.feature_importances_

# Print the feature importances
for feature, importance in zip(factor_col, importances2):
    print(f"{feature}: {importance}")
E Score: 0.14613749118630107
S Score: 0.12413907467663009
G Score: 0.12878085983271992
PCA Factor 1: 0.08660388796214366
PCA Factor 2: 0.11543424855423633
PCA Factor 3: 0.13374951829969456
Covid: 0.2651549194882743

Q3: Do machine learning models incorporating ESG factors useful in stock prediction?¶

We first merge all results into one dataframe. Then we will analyze mean-square-error for each models on test dataset to see how they perform.

In [721]:
pred_df = pd.DataFrame()

pred_df['Instrument'] = test_df['Instrument']
pred_df['Date'] = test_df['Date']
pred_df['LR'] = LR_test

pred_df['RF'] = RF_test

pred_df['SVR'] = svr_test

pred_df['GB'] = gbdt_test

pred_df['Actual Return'] = test_df['Next Q Return']
In [658]:
pred_df
Out[658]:
Instrument Date LR RF SVR GB Actual Return
5764 RHI 2021-04 0.123420 0.168349 0.158057 0.175005 0.127683
5929 SPGI 2021-04 0.123906 0.129902 0.138656 0.106414 0.035181
5883 ORCL 2021-04 0.126573 0.167482 0.105094 0.077113 0.131953
5972 LOW 2021-04 0.118705 0.120991 0.282981 0.113571 -0.018140
5946 MAA 2021-04 0.114827 0.143250 0.112849 0.124268 0.108835
... ... ... ... ... ... ... ...
7107 WM 2022-04 -0.001145 -0.000266 -0.015315 -0.004667 0.047261
6970 CAH 2022-04 -0.007403 -0.027068 0.175226 -0.030647 0.275684
6998 AVY 2022-04 0.003846 -0.006445 0.004178 0.018158 -0.060558
7078 CMS 2022-04 -0.004518 0.001014 0.066760 0.021595 -0.137185
6889 DXC 2022-04 -0.002212 0.004876 0.000648 0.004377 -0.192346

1429 rows × 7 columns

In [722]:
pred_df = pred_df.sort_values(by=['Instrument', 'Date'])
In [660]:
pred_df
Out[660]:
Instrument Date LR RF SVR GB Actual Return
5960 A 2021-04 0.110206 0.160231 0.115178 0.178601 0.146588
6059 A 2021-07 0.112734 0.131651 0.160899 0.163970 0.027801
6293 A 2021-10 0.101839 0.184062 0.105382 0.069218 -0.115372
6764 A 2022-01 -0.017033 -0.088556 -0.001623 -0.084798 -0.143913
7015 A 2022-04 -0.019506 -0.075513 -0.019761 -0.098354 0.124340
... ... ... ... ... ... ... ...
5820 ZTS 2021-04 0.117553 0.057486 -0.009100 0.071518 0.041747
6193 ZTS 2021-07 0.120532 0.098244 0.146891 0.123646 0.256979
6572 ZTS 2021-10 0.123629 0.235019 0.231796 0.210828 -0.227185
6768 ZTS 2022-01 0.002745 0.018562 0.012963 0.034367 -0.088552
6882 ZTS 2022-04 0.000819 0.070902 0.029919 0.063524 -0.137297

1429 rows × 7 columns

In [723]:
from sklearn.metrics import mean_squared_error
Y_test = Y_test.dropna() #drop out the last row for next return which is NAN
mse_lr = mean_squared_error(Y_test, LR_test)
mse_rf = mean_squared_error(Y_test, RF_test)
mse_svr = mean_squared_error(Y_test, svr_test)
mse_gb = mean_squared_error(Y_test, gbdt_test)
In [724]:
print(f"Mean Squared Error for Linear Regression: {mse_lr}")
print(f"Mean Squared Error for Random Forest: {mse_rf}")
print(f"Mean Squared Error for Support Vector Regression: {mse_svr}")
print(f"Mean Squared Error for Gradient Boosting: {mse_gb}")
Mean Squared Error for Linear Regression: 0.023257777000994646
Mean Squared Error for Random Forest: 0.024504062986203536
Mean Squared Error for Support Vector Regression: 0.024717368400565383
Mean Squared Error for Gradient Boosting: 0.02392708622907875

Looks like linear regression perform the best. This is reasonable because linear regression's objective function is trying to solely minimize mean square error while other models incorporate others things in, like regularization, which try to avoid overfitting problem. However, factors in our models are still too smaller compared to real asset management firms, who use over 100+ or even 1000+ factors in constructing models. These asset management firms are so complex that are easy exposed to overfitting while our models are much more simpler and thus subject less to overfit. In this case, linear regression may actually perform better on MSE metric.

Q4: Can ESG machine learning modelling be an investment strategy?¶

Our goal is not to just predict stock return bu to create a portfolio that beats the market. We will select 5 top stocks based on each model at the start of each quarter and sell at the next quarter based on their predicted next quarter return. We construct a stock portfolio based on equal weight of these 5 stocks (meaning purchasing same amount of money of each stock), and we try to see how the stock portfolio perform compared to each other as well as to S&P 500 index.

In [796]:
def get_ret(name):
    ret=[]

    for quar in pred_df.Date.unique():
        s_df = pred_df[pred_df['Date'] == quar]
        K= 5
        s_df = s_df.sort_values(by=name, ascending=False)
        ret1 = s_df.iloc[:K]['Actual Return'].mean()   #Equal Weight

        ret.append(ret1)
    return ret

LR_ret = get_ret(name="LR")
RF_ret = get_ret(name="RF")
SVR_ret = get_ret(name="SVR")
GBDT_ret = get_ret(name='GB')
In [726]:
all_ret = pd.DataFrame()

all_ret['Date'] = pred_df.Date.unique()
all_ret['Date'] = pd.to_datetime(all_ret['Date'])
all_ret['Date'] = all_ret['Date'] + pd.DateOffset(months=3)

all_ret['LR'] = LR_ret
all_ret['RF'] = RF_ret
all_ret['SVR'] = SVR_ret
all_ret['GBDT'] = GBDT_ret
In [728]:
all_ret_backup = all_ret
all_ret.head()
Out[728]:
Date LR RF SVR GBDT
0 2021-07-01 -0.023292 -0.041212 0.008082 0.020397
1 2021-10-01 0.088594 0.166775 0.115003 0.166775
2 2022-01-01 -0.098635 0.121873 -0.033019 0.028023
3 2022-04-01 -0.205070 -0.112781 -0.099513 -0.048980
4 2022-07-01 -0.070502 -0.028746 -0.017714 -0.094978

Below, we add in S&P 500 index return and merge it with our models' result.

In [141]:
sp500 = pd.read_excel('./sp500return.xlsx')
In [143]:
sp500 = sp500.T
In [156]:
sp5001 = sp500.iloc[2:,:]
sp5001 = sp5001.reset_index()
sp5001.columns = ['Date', 'Last Price']
In [439]:
sp5001
Out[439]:
Date Last Price
0 03/31/2016 2059.74
1 06/30/2016 2098.86
2 09/30/2016 2168.27
3 12/30/2016 2238.83
4 03/31/2017 2362.72
5 06/30/2017 2423.41
6 09/29/2017 2519.36
7 12/29/2017 2673.61
8 03/30/2018 2640.87
9 06/29/2018 2718.37
10 09/28/2018 2913.98
11 12/31/2018 2506.85
12 03/29/2019 2834.4
13 06/28/2019 2941.76
14 09/30/2019 2976.74
15 12/31/2019 3230.78
16 03/31/2020 2584.59
17 06/30/2020 3100.29
18 09/30/2020 3363
19 12/31/2020 3756.07
20 03/31/2021 3972.89
21 06/30/2021 4297.5
22 09/30/2021 4307.54
23 12/31/2021 4766.18
24 03/31/2022 4530.41
25 06/30/2022 3785.38
26 09/30/2022 3585.62
27 12/30/2022 3839.5
In [707]:
# Group by 'Instrument' and apply the custom function to each group
sp5002 = sp5001.set_index('Date').pct_change()

sp5003 = sp5002.tail(7)
sp5003
Out[707]:
Last Price
Date
06/30/2021 0.081706
09/30/2021 0.002336
12/31/2021 0.106474
03/31/2022 -0.049467
06/30/2022 -0.164451
09/30/2022 -0.052771
12/30/2022 0.070805
In [729]:
sp5004 = sp5003.reset_index()['Last Price']
all_return = all_ret.merge(sp5004, left_index=True, right_index=True)
all_return = all_return.dropna()
In [745]:
all_return = all_return.set_index('Date')
Out[745]:
LR RF SVR GBDT Last Price
Date
2021-07-01 -0.023292 -0.041212 0.008082 0.020397 0.081706
2021-10-01 0.088594 0.166775 0.115003 0.166775 0.002336
2022-01-01 -0.098635 0.121873 -0.033019 0.028023 0.106474
2022-04-01 -0.205070 -0.112781 -0.099513 -0.048980 -0.049467
2022-07-01 -0.070502 -0.028746 -0.017714 -0.094978 -0.164451

Lastly, we plot the return of our stock portfolio from each model and the return of actual S&P500 index return.

In [769]:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd

# Set figure size
plt.figure(figsize=(12,6))

# Plot LR
plt.plot(all_return['LR'], linestyle='--', lw=1.5, color='purple', label='LR')
# Plot RF
plt.plot(all_return['RF'], linestyle='-', lw=4, color='blue', label='RF')
# Plot SVR
plt.plot(all_return['SVR'], linestyle='--', lw=1.5, color='orange', label='SVR')
# Plot GBDT
plt.plot(all_return['GBDT'], linestyle='-', lw=4, color='Purple', label='GB')
# Plot S&P500 Index
plt.plot(all_return['Last Price'], linestyle='-.', lw=1.5, color='gray', label='S&P500 Index')

# Add shaded area between -0.2 and the S&P500 line
plt.fill_between(all_return.index, -0.2, all_return['Last Price'], color='red', alpha=0.2)

# Customize legend
plt.legend(fontsize=14)

# Create a list of dates with a 3-month interval
start_date = pd.Timestamp('2021-07-01')
end_date = pd.Timestamp('2022-07-01')
date_range = pd.date_range(start_date, end_date, freq='3M')

# Set custom x-axis tickers
ax = plt.gca()
ax.set_xticks(date_range)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))

# Show the plot
plt.show()

We can see random forest and gradient boosting are better at constructing stock portfolio. However, we do not see significant extra returns seized by our models compared to S&P500.

XIII - Discussion¶

In this project, we try to asnwer four questions:

  1. Does ESG explain Stock Volatilities? The relationship between ESG scores and stock volatility only exists in certain industries. In the Financials and Real Estate sectors, there is a positive correlation between ESG score and Beta, suggesting that companies with higher ESG scores tend to have higher betas and are more sensitive to market fluctuations. In contrast, in the Healthcare sector, there is a negative correlation between ESG score and Beta, suggesting that companies with higher ESG scores tend to have lower betas and are less sensitive to market fluctuations. In these three sectors, 15% ~ 21% of the variations of Beta are explained by the variations of ESG Score. Further research may be needed to fully understand the relationship between ESG scores and stock volatilities.
  1. Are ESG factors significant in stock prediction? According to the linear regression, only S-score is significant in terms of p-value. And it is negatively correrlated next quarter stock return. E-score and G-score are all positively correlated with returns but they are insigfinicant. Our result is in alignment with several scholars' research (Gregory, 2022; Visconti, 2021; Tharavanij, 2021) The reasons behind this could be as follows:
  • as for negative correlation between the S-score and the expected return, it is weird at the first glimpse as we originally think ESG scores should all be positively correlated with epxected returns. We believe that it might be due to the fact that improving S-score may actually hurt companies. For example, there are many companies in S&P500 that are in cost-sensitive industries, like mining or manufacturing. These companies lie their core advantage in providing cheap but good products. In doing this, they have to cut down their labor cost as big as possible, making their S-score low. If these companies now invest a ton of money trying to improve labor's wellfare, ethically it would be great, but from a financial perspective, it would hurt companies' business model. You may say that these are just for companies that are in these mining or manufacturing industries, for other industries who do not produce directly they should not suffer from this. However, in fact, all companies should produce: essentially, just due to globalization boom, they outsource their production activities to other places where production cost is lower. However, labors in these places still count as part of s-score of the company. And to be honest, all industries are cost-sensitive as they go to mature. Therefore, S-score improving may decrease company core-advantage or key strength that lower stock returns.
  • as for positive but not significant correlation between each E-score & G-score and expected returns, we suspect a positive correlation because Also, from a cash flow perspective, investing in activities improving E-score or G-score may increase future cash flow of the companies due to, for instance, less use of production raw materials, decrease in cost of capital (because company is less risky), or improve governance efficiency. These would result in a higher valuation of the company thus a higher stock return. However, investing in these activities may make the companies omit some lucrative expanding opportunties or even, in some cases, lead to over-investing in ESG. Thus, this kind of conflict is also validated by our model's result.

It is reasonable to suspect that ESG scores' relationships with returns is non-linear. We can think about the balance point between over-investing and necessary investing in ESG activities. Thus, this is also why we are using random forest, gradient boosting, and support vector regression (kernel: RDF), models that are non-linear in nature. Shown by gradient boosting, E-score is slightly higher than average significance. This is expected as E-score is nornally the most well-known factor by common investors given the green trasition is such a big phrase in our daily life. Every company talks about green trasition, primarily driven by their willingness to protect the environment and government regulations.

  1. Do machine learning models incorporating ESG factors useful in stock prediction? According to mean-square-error comparison, linear regression is the best in stock prediction in test dataset, followed by gradient boosting.

  2. Can machine learning models incorporating ESG factors be a good investment strategy? There is no significant crazy improvement of portfolios yielded by four machine learning models than S&P 500 index. Indeed, random forest and gradient boosting perform better than SVR and linear regression. However, random forest and gradient boosting still just ourperform S&P500 3 times out of 5 time periods, indicating a 60% success rate. As we can see from below, that random forest outperforms S&P500 by around 2% and gradient boosting outperforms S&P500 by around 1.4% across 5 quarter horizon. Can it be a good investment strategy? I personally would not throw my money in my models this time. :)

In [768]:
all_return.mean()
Out[768]:
LR           -0.061781
RF            0.021182
SVR          -0.005432
GBDT          0.014248
Last Price   -0.004680
dtype: float64

It can be said that any investment strategy derived from the four models is not so good: it does not capture crazy returns. This is because of arbitrage theory. For our study, the model is too easy compared to other models worked by top asset management firms who spend a great deal of time developed by a bunch of PH.D.. They do this for a living, and have already developed models capture the benefits that can be captured by our study's factors. This is why the adjusted R-squared of linear regression is 0.111 (we have tried to improve the R-squared: the original one is 0.001, a really astonishing number), and also why our investment strategy, for example the one derived from random forest only gives out 2% more return than S&P500. However, this project indeed let us know a lot more about quantitative finance, inspiring us to consider taking quantitative researcher as our career path.

XIV. - Conclusion, Limitation and Future Research¶

Our research provides insight to companies how their performance on ESG activities may affect their market capitalization. Also, our research may serve as a caution for companies who are over-investing in ESG or invest in S-score too much to make their business models fall apart. It is increasingly important to find a balance point between over investing and simply neglecting ESG.

Also, we can see E-score and G-score are not significant. This might be due to a lot of people simply do not know what E-score and G-score are. Everybody is talking about ESG and no one really looks at, for example, SASB's ESG disclosure report to truly figure what ESG really is. One of the insight from this research is to let policy maker to be aware that it is important to let investors or even common people know what ESG really are what they mean for their own and their children's future.

Moreover, for quantitative researchers, our study sheds light on that non-linear multifactor modelling on ESG may perform better on building investment strategy probably due to non-linear nature of ESG scores.

One of the future research direction we see is that there are different ESG rating agencies. This time, we only use Bloomberg ESG for our modelling part. However, there are many famous ESG rating agencies like MSCI, S&P Global, Sustainalytics, etc. These agencies all use different criteria and weighting methods to calcualte ESG scores. For example, Bloomberg dives into air quality, climate change, water, energy management, material waste, health and safety, audit risk, diversity, compensation, board structure and shareholder's rights into their framework (Bloomberg, 2023) while Refinitiv (2023) considers resource use, emissions, innovations, workforce, human rights, community, and product responsibility for social scores, and management, shareholders, and corporate social responsibility and controversity scores in their analysis. In this case, Bloomberg, unlike other agencies, do not incorporate innovation, product responsibility and things like that.

Thus, future research can be taken to incorporate ESG scores from other agencies into modelling to see whether they are effective in predicting stock returns. Another future research direction is to do it in other countries like China or Europe to see if their investors care about ESG.

XV. - References:¶

Bauer, R., Koedijk, K., & Otten, R. (2005). International evidence on ethical mutual fund performance and investment style. Journal of Banking and Finance, 29(7), 1751–1767. https://doi.org/10.1016/j.jbankfin.2004.06.035

Bloomberg ESG. (n.d.). Retrieved April 9, 2023, from https://www.bloomberg.com/professional/dataset/global-environmental- social-governance-data/

Chen, Y., Lo, M. C., Wang, S., & Lin, Y. (2020). ESG Disclosure, REIT Debt Financing, and Firm Value. Journal of Real Estate Research, 42(3), 349-378. doi: 10.1080/08965882.2018.1560396 https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3886058

Consolandi, C., Jaiswal-Dale, A., Poggiani, E., & Vercelli, A. (2009). Global Standards and Ethical Stock Indexes: The Case of the Dow Jones Sustainability Stoxx Index. Journal of Business Ethics, 87(S1), 185–197. https://doi.org/10.1007/s10551-008-9793-1

Gavrilakis, N., & Floros, C. (2023). ESG performance, herding behavior and stock market returns: evidence from Europe. Operational Research, 23(1). https://doi.org/10.1007/s12351-023-00745-1

Ghorbani, M. (2018, March 13). Stock Price Prediction using Principle Components. arXiv.org. https://arxiv.org/abs/1803.05075

Gilad-Bachrach, R. (n.d.). Why does gradient boosting generally outperform random forests? Quora. https://www.quora.com/Why-does-gradient-boosting-generally-outperform-random-forests

Gregory, R. L. (2021a). ESG activities and firm cash flow. Global Finance Journal, 52, 100698. https://doi.org/10.1016/j.gfj.2021.100698

He, F., Qin, S., Liu, Y., & Wu, J. (2022). CSR and idiosyncratic risk: Evidence from ESG information disclosure. Finance Research Letters, 49, 102936. https://doi.org/10.1016/j.frl.2022.102936

Henrique, B. M., Sobreiro, V. A., & Kimura, H. (2018). Stock price prediction using support vector regression on daily and up to the minute prices. The Journal of Finance and Data Science, 4(3), 183–201. https://doi.org/10.1016/j.jfds.2018.04.003

Jiang, S., & Jin, X. (2021). Effects of investor sentiment on stock return volatility: A spatio-temporal dynamic panel model. Economic Modelling, 97, 298–306. https://doi.org/10.1016/j.econmod.2020.04.002

Jin, I. H. (2018). Is ESG a systematic risk factor for US equity mutual funds? Journal of Sustainable Finance & Investment, 8(1), 72–93. https://doi.org/10.1080/20430795.2017.1395251

Justice, P., & Hale, J. (2016). MorningStar. https://s21.q4cdn.com/198919461/files/doc_downloads/press_kits/2016/Morningstar-Sustainability-Rating-Methodology.pdf

Lok, L. (2022). Decision Trees, Random Forests and Gradient Boosting: What’s the Difference? Leon Lok. https://leonlok.co.uk/blog/decision-trees-random-forests-gradient-boosting-whats-the-difference/

MSCI. (2017). ESG Investing: Considerations for the U.S. Market. https://www.msci.com/documents/10199/cbc27309-8157-4589-9cc0-00734bca6a6b

Murata, R., & Hamori, S. (2021). ESG Disclosures and Stock Price Crash Risk. Journal of Risk and Financial Management, 14(2), 70. https://doi.org/10.3390/jrfm14020070

Naffa, H., & Fain, M. (2020). Performance measurement of ESG-themed megatrend investments in global equity markets using pure factor portfolios methodology. PLOS ONE, 15(12), e0244225. https://doi.org/10.1371/journal.pone.0244225

Nazir, M., Akbar, A., Akbar, A., Poulovo, P., Hussain, A., & Qureshi, M. I. (2021a). The nexus between corporate environment, social, and governance performance and cost of capital: evidence from top global tech leaders. Environmental Science and Pollution Research, 29(15), 22623–22636. https://doi.org/10.1007/s11356-021-17362-0

Refinitiv. (2021). ESG Scores Methodology. Retrieved April 24, 2023, from https://www.refinitiv.com/content/dam/marketing/en_us/documents/methodology/refinitiv-esg-scores-methodology.pdf

Renneboog, L., Ter Horst, J., & Zhang, C. C. (2008). Socially responsible investments: Institutional aspects, performance, and investor behavior. Journal of Banking and Finance, 32(9), 1723–1742. https://doi.org/10.1016/j.jbankfin.2007.12.039

Spirova, T. (2023). ESG Scores and Stock Price Returns: How Do ESG Scores Impact Stock Returns in Netherlands. University of Twente. http://essay.utwente.nl/94564/1/Spirova_MA_BMS.pdf

SSE Initiative. (n.d.). ESG guidance database. Retrieved from https://sseinitiative.org/esg-guidance-database/

SVR Modeling and Parameter Optimization for Financial Time Series Forecasting. (2022, December 11). IEEE Conference Publication | IEEE Xplore. https://ieeexplore.ieee.org/document/10016054

Triguero, Á., Moreno-Mondéjar, L., & Davia, M. A. (2016). Leaders and Laggards in Environmental Innovation: An Empirical Analysis of SMEs in Europe. Business Strategy and the Environment, 25(1), 28–39. https://doi.org/10.1002/bse.1854

US Securities and Exchange Commission. (2021). Climate and ESG risks and opportunities. https://www.sec.gov/sec-response-climate-and-esg-risks-and-opportunities

Weber, Olaf. (2014). “The Financial Sector's Impact on Sustainable Development.” Journal of Sustainable Finance & Investment 4 (1): 1–8. doi:10.1080/20430795.2014.887345. https://www.tandfonline.com/doi/abs/10.1080/20430795.2014.887345

World Federation of Exchanges. (2021). Monthly Statistics. https://www.world-exchanges.org/home/index.php/statistics/monthly-reports

In [ ]: